Deep Learning with R for Beginners
上QQ阅读APP看书,第一时间看更新

CNNs

CNNs are the cornerstone of image classification in deep learning. This section gives an introduction to them, explains the history of CNNs, and will explain why they are so powerful.

Before we begin, we will look at a simple deep learning architecture. Deep learning models are difficult to train, so using an existing architecture is often the best place to start. An architecture is an existing deep learning model that was state-of-the-art when initially released. Some examples are AlexNet, VGGNet, GoogleNet, and so on. The architecture we will look at is the original LeNet architecture for digit classification from Yann LeCun and others from the mid 1990s. This architecture was used for the MNIST dataset. This dataset is comprised of grayscale images of 28 x 28 size that contain the digits 0 to 9. The following diagram shows the LeNet architecture:

Figure 5.1: The LeNet architecture 

The original images are 28 x 28 in size. We have a series of hidden layers which are convolution and pooling layers (here, they are labeled subsampling). Each convolutional layer changes structure; for example, when we apply the convolutions in the first hidden layer, our output size is three dimensional. Our final layer is of size 10 x 1, which is the same size as the number of categories. We can apply a softmax function here to convert the values in this layer to probabilities for each category. The category with the highest probability would be the category prediction for each image.