Deep Learning with R for Beginners
上QQ阅读APP看书,第一时间看更新

Neural network web application

First, we will look at an R Shiny web application. I encourage you to run the application and follow the examples as it will really help you to get a better understanding of how neural networks work. In order to run it, you will have to open the Chapter3 project in RStudio.

What is R Shiny?
R Shiny is an R package from the RStudio company that allows you to create interactive web apps using only R code. You can build dashboards and visualizations, and use the full functionality of R. You can extend R Shiny apps with CSS, widgets, and JavaScript. It is also possible to host your applications online. It is a great tool with which to showcase data science applications and I encourage you to look into it if you are not already familiar with it. For more information, see  https://shiny.rstudio.com/ , and, for examples of what is possible with R Shiny, see  https://shiny.rstudio.com/gallery/ .
  1. Open the server.R file in RStudio and click on the Run App button:
Figure 3.1: How to run an R Shiny application
  1. When you click on the Run App button, you should get a pop-up screen for your web application. The following is a screenshot of the web application after it starts up:
Figure 3.2:  R Shiny application on startup

This web application can be used in the pop-up window or opened in a browser. On the left, there is a set of input choices; these are parameters for the neural network. These are known as hyper-parameters, in order to distinguish between the parameters that the model is trying to optimize. From top to bottom, these hyper-parameters are:

  • Select data: There are four different datasets that you can use as training data.
  • Nodes in hidden layer: The number of nodes in the hidden layer. The neural network has only one hidden layer.
  • # Epochs: The number of times that the algorithm iterates over the data during model-building.
  • Learning rate: The learning rate applied during backpropagation. The learning rate affects how much the algorithm changes the weights during every epoch.
  • Activation function: The activation function applied to the output of each node.
  • The Run NN Model button trains a model with the selection of input. The Reset button restores input choices to the default values.

There are four different datasets to choose from, each with a different data distribution; you can select them from the drop-down box. They have descriptive names; for example, the data that is plotted in Figure 3.2 is called bulls_eye. These datasets are from another R package that is used to test clustering algorithms. The data has two classes of equal size and is composed of various geometric shapes. You can explore these datasets using the web application. The only change we make to the data is to randomly switches labels for 5% of the data. When you run the application, you will notice that there are some red points in the inner circle and some blue points in the outer circle. This is done so that our models should only achieve a maximum accuracy of 0.95 (95%). This gives us confidence that the model is working correctly. If the accuracy is higher than this, the model could be overfitting because the function it has learned is too complex. We will discuss overfitting again in the next section.

One of the first steps in machine learning should be to establish a benchmark score, this is useful for gauging your progress. A benchmark score could be a rule of thumb, or a simple machine learning algorithm; it should not be something that you spend a lot of time working on. In this application, we use a basic logistic regression model as a benchmark. We can see that in the previous screenshot, the accuracy for the logistic regression model is only 0.6075, or 60.75% accuracy. This is not much over 50%, but recall that logistic regression can only fit a straight line and this data cannot be separated using a straight line. A neural network should improve on the logistic regression benchmark, so if we get an accuracy of less than 0.6075 on this dataset, something is wrong with our model and we should review it.

So let's begin! Click on the Run NN Model button, which runs a neural network model on the data using the input choices. After a few seconds, the application should change to resemble the following screenshot:

Figure 3.3: Neural network model execution with default settings

The application takes a few seconds and then it creates a graph of the cost function over the # Epochs and outputs cost function values as the algorithm iterates over the data. The text output also includes the final accuracy for the model in the text at the bottom right of the screen. In the diagnostic messages in the bottom right, we can see that the cost decreases during training and we achieved a final accuracy rate of 0.825. The cost is what the model is trying to minimize  a lower cost means better accuracy. It took some time for the cost to start decreasing as the model struggled initially to get the right weights.

In deep learning models, weights and biases should be not initialized with random values. If random values are used, this can lead to problems with training, such as vanishing or exploding gradients. This is where the weights get too small or too large and the model fails to train successfully. Also, if the weights are not correctly initialized, the model will take longer to train, as we saw earlier. Two of the most popular techniques to initialize weights to avoid these problems are the  Xavier initialization and the He initialization  (named after their inventors).

We can see in Figure 3.3 that the cost has not plateaued, the last few values show it is still decreasing. This indicates that the model can be improved if we train it for longer. Change # Epochs to 7000 and click the Run NN Model button again; the screen will change to resemble the following plot:

Figure 3.4: Neural network model execution with more epochs

Now we get an accuracy of 0.95, which is the maximum possible accuracy rate. We notice that the cost values have plateaued (that is, are not decreasing further) to around 0.21. This indicates that training the model for longer (that is, more epochs) will probably not improve the results, regardless of the current accuracy number. If the model is under training and the cost values have plateaued, we would need to consider changing the architecture of the model or getting more data to improve our accuracy. Let's look at changing the number of nodes in our model. Click the Reset button to change the input values to their defaults, then change the number of nodes to 7, and click the Run NN Model button. Now the screen will change to the following:

Figure 3.5: Neural network model execution with more nodes

Our accuracy here is 0.68, but compare this to the earlier examples, when we used the same input and only three nodes. We actually get worse performance with more nodes! This is because our data has a relatively simple pattern, and a model with seven nodes might be too complex and will take longer to train. Adding more nodes to a layer will increase training time but does not always improve performance.

Let's look at the Learning rate. Click the Reset button to change the input values to their defaults, then change the Learning rate to around 5, and click the Run NN Model button again to replicate the following screen:

Figure 3.6: Neural network model execution with larger learning rate

We get 0.95 accuracy again, which is the best possible accuracy. If we compare it to the previous examples, we can see that the model converged (that is, the length of time it took for the cost function to plateau) much quicker, after just 500 epochs. We needed fewer epochs, so we can see an inverse relationship between learning rates and training epochs. A higher learning rate may mean you need fewer epochs. But are bigger learning rates always better? Well, no.

Click the Reset button to change the input values to their defaults, then change the Learning rate to the maximum value (20), and click the Run NN Model button again. When you do, you will get similar output to the following:

Figure 3.7: Neural network model execution with too great a learning rate

We get an accuracy rate of 0.83. What just happened? By selecting a huge learning rate, our model failed to converge at all. We can see that the cost function actually increases at the start of training, which indicates that the Learning rate is too high. Our cost function graph seems to have repeating values, which indicates that the gradient-descent algorithm is overshooting the minima at times. 

Finally, we can look at how the choice of activation function affects model training. By changing the activation function, you may also need to change the Learning rate. Click the Reset button to change the input values to their defaults and select tanh for the activation function. When we select tanh as the activation function and 1.5 as the Learning rate, the cost gets stuck at 0.4 from epochs 500-3,500 before suddenly decreasing to 0.2. This can occur in neural networks when they get stuck in local optima. This phenomena can be seen in the following plot:

Figure 3.8 : Neural network model execution with the tanh activation function

In contrast, using relu activation results in the model training faster. The following is an example where we only run 1,500 epochs with the relu activation to get the maximum possible accuracy of 0.95:

Figure 3.9: Neural network model execution with the relu activation function

I encourage you to experiment with the other datasets. For reference purposes, here is the max accuracy I got for each of those datasets. An interesting experiment is to see how different activation functions and learning rates work with these datasets:

  • worms (accuracy=0.95): 3 nodes, 3,000 epochs, Learning rate = 0.5, activation = tanh
  • moon (accuracy=0.95): 5 nodes, 5,000 epochs, Learning rate = 5, activation = sigmoid
  • blocks (accuracy=0.9025): 5 nodes, 5,000 epochs, Learning rate = 10, activation = sigmoid

In general, you will see the following:

  • Using more epochs means a longer training time, which may not always be needed.
  • If the model has not achieved the best accuracy and the cost function has plateaued (that is, it is not decreasing by much) toward the end of the training, then running it longer (that is, more epochs) or increasing the learning rate is unlikely to improve performance. Instead, look at changing the model's architecture, such as by changing the # layers (not an option in this demo), adding more nodes, or changing the activation functions.
  • The learning rate must be selected carefully. If the value selected is too low, it will take a long time for the model to train. If the value selected is too high, the model will fail to train.