Recurrent neural networks
Recurrent neural networks (RNNs) are another type of neural network, and are tremendously good at NLP tasks—for example, sentiment analysis, sequence prediction, speech-to-text translation, language-to-language translation, and so on. Consider an example: you open up Google and you start searching for recurrent neural networks. The moment you start typing a word, Google starts giving you a list of suggestions which is most likely to be topped by the complete word, or the most commonly searched phrase that begins with the letters you have typed by then. This is an example of sequence prediction where the task is to predict the next sequence of the given phrase.
Let's take another example: you are given a bunch of English sentences containing one blank per sentence. Your task is to appropriately fill the gaps with the correct words. Now, in order to do this, you will need to use your previous knowledge of the English language in general and make use of the context as much as possible. To use previously encountered information like this, you use your memory. But what about neural networks? Traditional neural networks cannot do this because they do not have any memory. This is exactly where RNNs come into the picture.
The question that we need to answer is how can we empower neural networks with memory? An absolutely naive idea would be to do the following:
- Feed a certain sequence into a neuron.
- Take the output of the neuron and feed it to the neuron again.
It turns out that this idea is not that naive, and in fact constitutes the foundation of the RNN. A single layer of an RNN actually looks like the following:
The loop seems to be a bit mysterious. You might already be thinking about what happens in each iteration of the loop:
In the preceding diagram, an RNN (the figure on the left) is unrolled to show three simple feedforward networks. But what do these unrolled networks do? Let's find this out now.
Let's consider the task of sequence prediction. To keep it simple, we will look at how an RNN can learn to predict the next letter to complete a word. For example, if we train the network with a set of letters, {w, h, a, t}, and after giving the letters w,h, and a sequentially, the network should be able to predict that the letter should be t so that the meaningful word "what" is produced. Just like the feed-forward networks we saw earlier, X serves as the input vector to the network in RNN terminology, this vector is also referred to as the vocabulary of the network. The vocabulary of the network is, in this case, {w, h, a, t}.
The network is fed with the letters w,h, and a sequentially. Let's try to give indices to the letters:
- →
- →
- →
These indices are known as time-steps (the superscripts in the figure presenting the unrolling of an RNN). A recurrent layer makes use of the input that is given at previous time-steps, along with a function when operating on the current time-step. Let's see how the output is produced by this recurrent layer step by step.