The VGG16 transfer learning network
We will take the output from the last pooling layer in the pre-trained VGG16 network and add a couple of fully connected layers of 512 units each, followed by the output layer. The output of the final pooling layer is passed from a global average pooling operation before the fully connected layer. We can just flatten the output of the pooling layer, instead of performing global average pooling—the idea is to ensure that the output of the pooling is not in a two-dimensional lattice format, but rather, in a one-dimensional array format, much like a fully connected layer. The following diagram illustrates the architecture of the new VGG16, based on the pre-trained VGG16:
As shown in the preceding diagram, we will extract the output from the last max-pooling layer in the pre-trained network and attach two fully connected layers before the final output layer. Based on the preceding architecture, the VGG definition function can be defined as shown the following code block, using keras:
def VGG16_pseudo(dim=224,freeze_layers=10,full_freeze='N'):
# model_save_dest = {}
model = VGG16(weights='imagenet',include_top=False)
x = model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
out = Dense(5,activation='softmax')(x)
model_final = Model(input = model.input,outputs=out)
if full_freeze != 'N':
for layer in model.layers[0:freeze_layers]:
layer.trainable = False
return model_final
We are going to use the weights from the pre-trained VGG16 trained on ImageNet as the initial weights of the model, and then fine-tune the model. We are also freezing the weights of the first few layers (10 is the default) since, in a CNN, the first few layers learn to detect generic features, such as edges, color composition, and so on. Hence, the features will not vary much across domains. Freezing a layer refers to not training the weights that are specific to that layer. We can experiment with the number of layers to freeze, and take the one that provides the best validation score. Since we are performing multi-class classification, the softmax activation function has been chosen for the output layer.