Deep learning layers
In the earlier code snippets, we saw some layers for a deep learning model, including mx.symbol.FullyConnected, mx.symbol.Activation, and mx.symbol.Dropout. Layers are how models are constructed; they are computational transformations of data. For example, mx.symbol.FullyConnected is the first type of layer operation we matrix operation we introduced in Chapter 1, Getting Started with Deep Learning. It is fully connected because all input values are connected to all nodes in the layer. In other deep learning libraries, such as Keras, it is called a dense layer.
The mx.symbol.Activation layer performs an activation function on the output of the previous layer. The mx.symbol.Dropout layer performs dropout on the output from the previous layer. Other common layer types in MXNet are:
- mxnet.symbol.Convolution: Performs a convolutional operation that matches patterns across the data. It is mostly used in computer vision tasks, which we will see in Chapter 5, Image Classification Using Convolutional Neural Networks. They can also be used for Natural Language Processing, which we will see in Chapter 6, Natural Language Processing Using Deep Learning.
- mx.symbol.Pooling: Performs pooling on the output from the previous layer. Pooling reduces the number of elements by taking the average, or max value, from sections of the input. These are commonly used with convolutional layers.
- mx.symbol.BatchNorm: Used to normalize the weights from the previous layer. This is done for the same reason you normalize input data before model-building: it helps the model to train better. It also prevents vanishing and exploding gradients where gradients get very, very small or very, very large during training. This can cause the model to fail to converge, that is, training will fail.
- mx.symbol.SoftmaxOutput: Calculates a softmax result from the output from the previous layer.
There are recognized patterns for using these layers, for example, an activation layer normally follows a fully-connected layer. A dropout layer is usually applied after the activation function, but can be between the fully connected layer and the activation function. Convolutional layers and pooling layers are often used together in image tasks in that order. At this stage, there is no need to try to memorize when to use these layers; you will encounter plenty of examples in the rest of this book!