Advanced Machine Learning with R
上QQ阅读APP看书,第一时间看更新

Deep learning resources and advanced methods

One of the more interesting visual tools you can use for both learning and explaining is the interactive widget provided by TensorFlow: http://playground.tensorflow.org/. This tool allows you to explore, or tinker, as the site calls it, the various parameters and how they impact on the response, be it a classification problem or a regression problem. I could spend – well, I have spent – hours tinkering with it.

Here is an interesting task: create your own experimental design and see how the various parameters affect your prediction.

At this point, the fastest-growing deep learning open source tool is TensorFlow. You can access TensorFlow with R, but it requires you to install Python first. What we will go through in the practical exercise is Keras, which is an API that can run on top of TensorFlow, or other backend neural networks such as Theano. The creators of Keras designed it to simplify the development and testing of deep neural networks. We will discuss TensorFlow and Keras a little more in-depth, prior to our implementation of a problem.

I also really like using MXNet, which does not require the installation of Python and is relatively easy to install and make operational. It also offers a number of trained models that allow you to start making predictions quickly. Several R tutorials are available at http://mxnet.io/.

I now want to take the time to enumerate some of the variations of deep neural networks along with the learning tasks where they have performed well.

Convolutional neural networks (CNNs) make the assumption that the inputs are images and create features from slices or small portions of the data, which are combined to create a feature map. Think of these small slices as filters or, probably more appropriately, kernels that the network learns during training. The activation function for a CNN is a rectified linear unit (ReLU). It is simply f(x) = max(0, x), where x is the input to the neuron. CNNs perform well on image classification, and object detection.

Recurrent neural networks (RNNs) are created to make use of sequential information. In traditional neural networks, the inputs and outputs are independent of each other. With RNNs, the output is dependent on the computations of previous layers, permitting information to persist across layers. So, take an output from a neuron (y); it is calculated not only on its input (t) but on all previous layers (t-1, t-n...). It is effective at handwriting and speech detection.

Long short-term memory (LSTM) is a special case of an RNN. The problem with an RNN is that it does not perform well on data with long signals. Thus, LSTMs were created to capture complex patterns in data. RNNs combine information during training from previous steps in the same way, regardless of the fact that information in one step is more or less valuable than other steps. LSTMs seek to overcome this limitation by deciding what to remember at each step during training. This multiplication of a weight matrix by the data vector is referred to as a gate, which acts as an information filter. A neuron in an LSTM will have two inputs and two outputs. The input from prior outputs and the memory vector passed from the previous gate. Then, it produces the output values and output memory as inputs to the next layer. LSTMs have the limitation of requiring a healthy dose of training data and are computationally intensive. LSTMs have performed well on speech recognition problems and in complicated time series analysis.

With that, let's move on to some practical applications.