Fast R-CNN

6 minute read

classification

AlexNet
ZfNet : 비주얼라이제이션에서 성능개선으로 발전/ 테크닉으로 모델 자체는 아님
VGGNet : 3 x 3
GoogleNet : inception
ResNet : residual block

detection

R-CNN (overfeat) : 기존보다 30% 높아짐
SPPNet : pooling
Fast R-CNN

segmentation

FCN
convolution 을 하면 : features and locations
multi-scale training : VGG 에서도 사용했음

detection 에서 현재도 진행 되고 있는 3가지 이슈

속도
imbalance
사이즈별 이슈

Fast-RCNN 모델

PASCAL VOC 데이타셋 사용
애노테이션 된 데이터셋을 불러온다

import pandas as pd

import tensorflow as tf

airplane = pd.read_csv('dataset/annotations/airplane.csv', header=None)

airplane.rename({0:'filename',1:'xmin',2:'ymin',3:'xmax',4:'ymax', 5:'target' }, axis=1, inplace=True)

airplane

	filename	xmin	ymin	xmax	ymax	target
0	image_0001.jpg	49	30	349	137	airplane
1	image_0002.jpg	59	35	342	153	airplane
2	image_0003.jpg	47	36	331	135	airplane
3	image_0004.jpg	47	24	342	141	airplane
4	image_0005.jpg	48	18	339	146	airplane
...	...	...	...	...	...	...
795	image_0796.jpg	57	27	356	118	airplane
796	image_0797.jpg	56	25	350	110	airplane
797	image_0798.jpg	59	25	354	110	airplane
798	image_0799.jpg	49	26	347	116	airplane
799	image_0800.jpg	53	27	348	109	airplane

800 rows × 6 columns

face = pd.read_csv('dataset/annotations/face.csv', header=None)
face.rename({0:'filename',1:'xmin',2:'ymin',3:'xmax',4:'ymax', 5:'target' }, axis=1, inplace=True)
face

	filename	xmin	ymin	xmax	ymax	target
0	image_0001.jpg	251	15	444	300	face
1	image_0002.jpg	106	31	296	310	face
2	image_0003.jpg	207	17	385	279	face
3	image_0004.jpg	102	55	303	328	face
4	image_0005.jpg	246	30	446	312	face
...	...	...	...	...	...	...
430	image_0431.jpg	119	16	327	262	face
431	image_0432.jpg	117	14	322	251	face
432	image_0433.jpg	193	24	400	281	face
433	image_0434.jpg	127	13	337	268	face
434	image_0435.jpg	213	20	418	269	face

435 rows × 6 columns

motorcycle = pd.read_csv('dataset/annotations/motorcycle.csv', header=None)
motorcycle.rename({0:'filename',1:'xmin',2:'ymin',3:'xmax',4:'ymax', 5:'target' }, axis=1, inplace=True)
motorcycle

	filename	xmin	ymin	xmax	ymax	target
0	image_0001.jpg	31	19	233	141	motorcycle
1	image_0002.jpg	32	15	232	142	motorcycle
2	image_0003.jpg	30	20	234	143	motorcycle
3	image_0004.jpg	30	15	231	132	motorcycle
4	image_0005.jpg	31	19	232	145	motorcycle
...	...	...	...	...	...	...
793	image_0794.jpg	47	44	218	133	motorcycle
794	image_0795.jpg	44	38	216	135	motorcycle
795	image_0796.jpg	47	40	217	141	motorcycle
796	image_0797.jpg	48	54	211	150	motorcycle
797	image_0798.jpg	42	33	218	140	motorcycle

798 rows × 6 columns

data = pd.concat([airplane, face, motorcycle], ignore_index=True)

data

	filename	xmin	ymin	xmax	ymax	target
0	image_0001.jpg	49	30	349	137	airplane
1	image_0002.jpg	59	35	342	153	airplane
2	image_0003.jpg	47	36	331	135	airplane
3	image_0004.jpg	47	24	342	141	airplane
4	image_0005.jpg	48	18	339	146	airplane
...	...	...	...	...	...	...
2028	image_0794.jpg	47	44	218	133	motorcycle
2029	image_0795.jpg	44	38	216	135	motorcycle
2030	image_0796.jpg	47	40	217	141	motorcycle
2031	image_0797.jpg	48	54	211	150	motorcycle
2032	image_0798.jpg	42	33	218	140	motorcycle

2033 rows × 6 columns

data.target.value_counts().plot.pie()

<AxesSubplot:ylabel='target'>

png

airplane.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   filename  800 non-null    object
 1   xmin      800 non-null    int64 
 2   ymin      800 non-null    int64 
 3   xmax      800 non-null    int64 
 4   ymax      800 non-null    int64 
 5   target    800 non-null    object
dtypes: int64(4), object(2)
memory usage: 37.6+ KB

ig = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1/255)

fig = ig.flow_from_dataframe(airplane, directory='dataset/images/airplane/', y_col='target') 
# classification용이 기본 사용
# classification 을 기본사용 지금 하려고 하는 것은 regresion

Found 800 validated image filenames belonging to 1 classes.

rig = ig.flow_from_dataframe(airplane, directory='dataset/images/airplane/', class_mode='raw',
                             y_col=['xmin','ymin','xmax','ymax'], target_size=(224,224))

Found 800 validated image filenames.

t = next(rig)

import matplotlib.pyplot as plt

import matplotlib.patches as pt

t[0][0][0][0]

array([1., 1., 1.], dtype=float32)

import imageio

d = imageio.imread('dataset/images/airplane/image_0003.jpg')

/var/folders/9k/jsf_2t1d6ts48d1mpfj3nxp00000gn/T/ipykernel_58546/1106939017.py:1: DeprecationWarning: Starting with ImageIO v3 the behavior of this function will switch to that of iio.v3.imread. To keep the current behavior (and make this warning dissapear) use `import imageio.v2 as imageio` or call `imageio.v2.imread` directly.
  d = imageio.imread('dataset/images/airplane/image_0003.jpg')

d.shape

(165, 393, 3)

fig, ax = plt.subplots()
ax.imshow(t[0][0])
p = pt.Rectangle((t[1][0][0],t[1][0][1]), t[1][0][2] - t[1][0][0], t[1][0][3] - t[1][0][1], fill=None)
ax.add_patch(p)

<matplotlib.patches.Rectangle at 0x16852f850>

png

vgg = tf.keras.applications.VGG16(include_top=False)
vgg.trainable = False

Metal device set to: Apple M1

systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

2022-08-07 22:10:00.985909: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-08-07 22:10:00.986000: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)

input_ = tf.keras.Input((224,224,3))
x = tf.keras.applications.vgg16.preprocess_input(input_)
x = vgg(x)
x = tf.keras.layers.GlobalAvgPool2D()(x)
x = tf.keras.layers.Dense(128, activation='relu')(x)
x = tf.keras.layers.Dense(4)(x)

model = tf.keras.Model(input_, x)

model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 tf.__operators__.getitem (S  (None, 224, 224, 3)      0         
 licingOpLambda)                                                 
                                                                 
 tf.nn.bias_add (TFOpLambda)  (None, 224, 224, 3)      0         
                                                                 
 vgg16 (Functional)          (None, None, None, 512)   14714688  
                                                                 
 global_average_pooling2d (G  (None, 512)              0         
 lobalAveragePooling2D)                                          
                                                                 
 dense (Dense)               (None, 128)               65664     
                                                                 
 dense_1 (Dense)             (None, 4)                 516       
                                                                 
=================================================================
Total params: 14,780,868
Trainable params: 66,180
Non-trainable params: 14,714,688
_________________________________________________________________

tf.keras.utils.plot_model(model, rankdir='LR')

png

temp = next(rig)

temp[1]

array([[ 56,  29, 354, 141],
       [ 54,  28, 348, 116],
       [ 54,  17, 339, 127],
       [ 57,  32, 343, 163],
       [ 51,  27, 348,  90],
       [ 52,  26, 345,  93],
       [ 52,  27, 340, 113],
       [ 54,  29, 349, 147],
       [ 48,  30, 335, 138],
       [ 46,  30, 344,  96],
       [ 50,  30, 349, 140],
       [ 50,  27, 351, 122],
       [ 56,  38, 339, 100],
       [ 52,  27, 349, 120],
       [ 58,  30, 350, 133],
       [ 54,  29, 332, 124],
       [ 54,  26, 359, 124],
       [ 57,  34, 352, 109],
       [ 57,  36, 348, 142],
       [ 64,  29, 350, 134],
       [ 58,  29, 351,  97],
       [ 60,  33, 357, 136],
       [ 52,  20, 345, 116],
       [ 43,  31, 344, 117],
       [ 48,  28, 344, 116],
       [ 51,  28, 345, 124],
       [ 62,  32, 354, 126],
       [ 66,  37, 347, 136],
       [ 44,  27, 343, 127],
       [ 53,  25, 348, 123],
       [ 56,  31, 346, 135],
       [ 49,  52, 332, 141]])

model(temp[0]).numpy() - temp[1]

array([[ -56.38383585,  -26.92581081, -351.90184021, -141.11110169],
       [ -54.3743335 ,  -25.94344282, -345.94700241, -116.06694156],
       [ -54.39934939,  -14.93723774, -336.96593046, -127.08219719],
       [ -57.385346  ,  -29.93573689, -340.96046686, -163.07269567],
       [ -51.393727  ,  -24.92781806, -345.9256227 ,  -90.11360183],
       [ -52.38001573,  -23.9453249 , -342.95323801,  -93.06148297],
       [ -52.39965969,  -24.93242383, -337.94289875, -113.08780289],
       [ -54.38634133,  -26.91725492, -346.94060588, -147.13520712],
       [ -48.41993988,  -27.91870856, -332.97410989, -138.12125772],
       [ -46.39498848,  -27.92801142, -341.91682529,  -96.13376263],
       [ -50.38537461,  -27.94066787, -346.94728279, -140.06692994],
       [ -50.38688052,  -24.93188739, -348.94598317, -122.06827444],
       [ -56.38336837,  -35.94723868, -336.94879794, -100.06997526],
       [ -52.38028693,  -24.94339895, -346.95658588, -120.06992471],
       [ -58.37741381,  -27.94080377, -347.94192076, -133.06600195],
       [ -54.3893646 ,  -26.92963624, -329.95514679, -124.11062384],
       [ -54.39489698,  -23.93282413, -356.93191791, -124.08716041],
       [ -57.37612855,  -31.94030595, -349.94950581, -109.05800468],
       [ -57.38926929,  -33.93914509, -345.94167423, -142.07492548],
       [ -64.40528035,  -26.92789412, -347.96505427, -134.09602776],
       [ -58.37893188,  -26.94068813, -348.94472647,  -97.07336259],
       [ -60.37311864,  -30.95637631, -354.94604254, -136.04438472],
       [ -52.39563018,  -17.91245365, -342.92767572, -116.11294216],
       [ -43.38988453,  -28.90565705, -341.90992355, -117.12336648],
       [ -48.37204134,  -25.93838024, -341.95470834, -116.07972765],
       [ -51.38468373,  -25.95033455, -342.95255399, -124.05679154],
       [ -62.38642246,  -29.94322205, -351.93779683, -126.07096869],
       [ -66.35655737,  -34.95575094, -344.9392302 , -136.0432446 ],
       [ -44.41924202,  -24.91795015, -340.92508149, -127.12302163],
       [ -53.42388099,  -22.91572809, -345.9529283 , -123.12313113],
       [ -56.3797375 ,  -28.95698285, -343.9449861 , -135.05625021],
       [ -49.41572213,  -49.89358044, -330.00320482, -141.14053494]])

model.compile(loss=tf.keras.losses.MSE)

model.fit(rig, epochs=5)

Epoch 1/5


2022-08-07 22:10:03.254772: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-08-07 22:10:03.661173: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


25/25 [==============================] - 15s 542ms/step - loss: 33033.6016
Epoch 2/5
25/25 [==============================] - 13s 529ms/step - loss: 29837.9766
Epoch 3/5
25/25 [==============================] - 13s 525ms/step - loss: 26341.7500
Epoch 4/5
25/25 [==============================] - 13s 526ms/step - loss: 22538.5625
Epoch 5/5
25/25 [==============================] - 13s 528ms/step - loss: 18658.8418





<keras.callbacks.History at 0x1685adc70>

Twitter Facebook LinkedIn

D.J Hwang

Fast R-CNN

classification

detection

segmentation

detection 에서 현재도 진행 되고 있는 3가지 이슈

Fast-RCNN 모델

You May Also Enjoy

Human Activity Recognition on STM32L4 IoTnode

Faster R-CNN

R-CNN/SPP-Net

BCD 변환 코드