Convolution
The convolution operation is a linear operation, represented by an asterisk, that merges two signals:
Two-dimensional convolutions are used in image processing to implement image filters, for example, to find a specific patch on an image or to find some feature in an image.
In CNNs, the convolutional layers filter an input tensor in a tile like fashion with a small window called a kernel. The kernel is what defines exactly the things a convolution operation is going to filter for and will produce a strong response when it finds what it’s looking for.
The following figure shows the result of convolving an image with a particular kernel called a Sobel Filter that is good for finding edges in an image:
As you might have guessed, the parameters to be learned in a convolution layer are the weights of a layer's kernel. During the training of CNN, the values of these filters are adjusted automatically in order to extract the most useful information for the task at hand.
In traditional neural networks, we would have to convert any input data to a single one-dimensional vector, thus losing all the important spatial information after this vector is sent to a fully connected layer. Moreover, each pixel would have a parameter per neuron leading to an explosion in the number of parameters in a model with any large input size or depth.
However, in the case of a convolution layer, each kernel will slide across the entire input "searching" for specific patches. Kernels in CNNs are small in size and independent of the size of what they convolving. As a result, the expense of using conv layers, in terms of parameters, is generally much less than compared to the traditional dense layers we learnt about earlier.
The following figure shows the difference between a traditional fully connected layer and a convolutional (locally connected) layer. Note the huge difference in parameters:
Now, perhaps we want our convolution layer to look for six different things in its input instead of just one. In this case, we will just give the convolution layer six filters of the same size (5x5x3 in this case) instead of just one. Each conv filter will then look for a particular pattern in the input.
The input and output for this particular six filter convolution layers is shown in the following diagram:
The main hyperparameters that control the behavior of the convolution layer are as follows:
- Kernel size (K): How big your sliding windows are in pixels. Small is generally better and usually odd value such as 1,3,5 or sometimes rarely 7 are used.
- Stride (S): How many pixels the kernel window will slide at each step of convolution. This is usually set to 1, so no locations are missed in an image but can be higher if we want to reduce the input size down at the same time.
- Zero padding (pad): The amount of zeros to put on the image border. Using padding allows the kernel to completely filter every location of an input image, including the edges.
- Number of filters (F): How many filters our convolution layer will have. It controls the number of patterns or features that a convolution layer will look for.
In TensorFlow, we would find the the 2-D convolution layer in the tf.layers module, and it can be added to your model as follows:
conv1 = tf.layers.conv2d( inputs=input_layer, filters=32, kernel_size=[5, 5], padding="same", activation=tf.nn.relu)