  • AlexNet
  • ZfNet : 비주얼라이제이션에서 성능개선으로 발전/ 테크닉으로 모델 자체는 아님
  • VGGNet : 3 x 3
  • GoogleNet : inception
  • ResNet : residual block


  • R-CNN (overfeat) : 기존보다 30% 높아짐
  • SPPNet : pooling
  • Fast R-CNN


  • FCN

  • convolution 을 하면 : features and locations
  • multi-scale training : VGG 에서도 사용했음

detection 에서 현재도 진행 되고 있는 3가지 이슈

  • 속도
  • imbalance
  • 사이즈별 이슈

Fast-RCNN 모델

  • PASCAL VOC 데이타셋 사용
  • 애노테이션 된 데이터셋을 불러온다
import pandas as pd
import tensorflow as tf
airplane = pd.read_csv('dataset/annotations/airplane.csv', header=None)
airplane.rename({0:'filename',1:'xmin',2:'ymin',3:'xmax',4:'ymax', 5:'target' }, axis=1, inplace=True)
filename xmin ymin xmax ymax target
0 image_0001.jpg 49 30 349 137 airplane
1 image_0002.jpg 59 35 342 153 airplane
2 image_0003.jpg 47 36 331 135 airplane
3 image_0004.jpg 47 24 342 141 airplane
4 image_0005.jpg 48 18 339 146 airplane
... ... ... ... ... ... ...
795 image_0796.jpg 57 27 356 118 airplane
796 image_0797.jpg 56 25 350 110 airplane
797 image_0798.jpg 59 25 354 110 airplane
798 image_0799.jpg 49 26 347 116 airplane
799 image_0800.jpg 53 27 348 109 airplane

800 rows × 6 columns

face = pd.read_csv('dataset/annotations/face.csv', header=None)
face.rename({0:'filename',1:'xmin',2:'ymin',3:'xmax',4:'ymax', 5:'target' }, axis=1, inplace=True)
filename xmin ymin xmax ymax target
0 image_0001.jpg 251 15 444 300 face
1 image_0002.jpg 106 31 296 310 face
2 image_0003.jpg 207 17 385 279 face
3 image_0004.jpg 102 55 303 328 face
4 image_0005.jpg 246 30 446 312 face
... ... ... ... ... ... ...
430 image_0431.jpg 119 16 327 262 face
431 image_0432.jpg 117 14 322 251 face
432 image_0433.jpg 193 24 400 281 face
433 image_0434.jpg 127 13 337 268 face
434 image_0435.jpg 213 20 418 269 face

435 rows × 6 columns

motorcycle = pd.read_csv('dataset/annotations/motorcycle.csv', header=None)
motorcycle.rename({0:'filename',1:'xmin',2:'ymin',3:'xmax',4:'ymax', 5:'target' }, axis=1, inplace=True)
filename xmin ymin xmax ymax target
0 image_0001.jpg 31 19 233 141 motorcycle
1 image_0002.jpg 32 15 232 142 motorcycle
2 image_0003.jpg 30 20 234 143 motorcycle
3 image_0004.jpg 30 15 231 132 motorcycle
4 image_0005.jpg 31 19 232 145 motorcycle
... ... ... ... ... ... ...
793 image_0794.jpg 47 44 218 133 motorcycle
794 image_0795.jpg 44 38 216 135 motorcycle
795 image_0796.jpg 47 40 217 141 motorcycle
796 image_0797.jpg 48 54 211 150 motorcycle
797 image_0798.jpg 42 33 218 140 motorcycle

798 rows × 6 columns

data = pd.concat([airplane, face, motorcycle], ignore_index=True)
filename xmin ymin xmax ymax target
0 image_0001.jpg 49 30 349 137 airplane
1 image_0002.jpg 59 35 342 153 airplane
2 image_0003.jpg 47 36 331 135 airplane
3 image_0004.jpg 47 24 342 141 airplane
4 image_0005.jpg 48 18 339 146 airplane
... ... ... ... ... ... ...
2028 image_0794.jpg 47 44 218 133 motorcycle
2029 image_0795.jpg 44 38 216 135 motorcycle
2030 image_0796.jpg 47 40 217 141 motorcycle
2031 image_0797.jpg 48 54 211 150 motorcycle
2032 image_0798.jpg 42 33 218 140 motorcycle

2033 rows × 6 columns



ig = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1/255)
fig = ig.flow_from_dataframe(airplane, directory='dataset/images/airplane/', y_col='target') 
# classification용이 기본 사용
# classification 을 기본사용 지금 하려고 하는 것은 regresion
Found 800 validated image filenames belonging to 1 classes.
rig = ig.flow_from_dataframe(airplane, directory='dataset/images/airplane/', class_mode='raw',
                             y_col=['xmin','ymin','xmax','ymax'], target_size=(224,224))
Found 800 validated image filenames.
t = next(rig)
import matplotlib.pyplot as plt
import matplotlib.patches as pt
array([1., 1., 1.], dtype=float32)
import imageio
d = imageio.imread('dataset/images/airplane/image_0003.jpg')
(165, 393, 3)
fig, ax = plt.subplots()
p = pt.Rectangle((t[1][0][0],t[1][0][1]), t[1][0][2] - t[1][0][0], t[1][0][3] - t[1][0][1], fill=None)
<matplotlib.patches.Rectangle at 0x16852f850>


vgg = tf.keras.applications.VGG16(include_top=False)
vgg.trainable = False
input_ = tf.keras.Input((224,224,3))
x = tf.keras.applications.vgg16.preprocess_input(input_)
x = vgg(x)
x = tf.keras.layers.GlobalAvgPool2D()(x)
x = tf.keras.layers.Dense(128, activation='relu')(x)
x = tf.keras.layers.Dense(4)(x)
model = tf.keras.Model(input_, x)
Model: "model"
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 224, 224, 3)]     0         
 tf.__operators__.getitem (S  (None, 224, 224, 3)      0         
 tf.nn.bias_add (TFOpLambda)  (None, 224, 224, 3)      0         
 vgg16 (Functional)          (None, None, None, 512)   14714688  
 global_average_pooling2d (G  (None, 512)              0         
 dense (Dense)               (None, 128)               65664     
 dense_1 (Dense)             (None, 4)                 516       
Total params: 14,780,868
Trainable params: 66,180
Non-trainable params: 14,714,688
tf.keras.utils.plot_model(model, rankdir='LR')


temp = next(rig)
model.fit(rig, epochs=5)
<keras.callbacks.History at 0x1685adc70>