Initializing the weight matrix and more
When there are neural networks, there are weights. This is true, right? But before we start to deal with the weights for our RNN, let's see exactly where they are needed.
There are two different weight matrices in the case of an RNN—one for the input neuron (remember that we feed feature vectors only through neurons) and one for the recurrent neuron. A particular state in an RNN is produced using the following two equations:
To understand what each term means in the first equation, refer to the following image (don't worry, we will get to the second equation):
The first pass of the RNN is the letter w. We will randomly initialize the two weight matrices as present in the equation (1). Assume that the matrix after getting initialized looks like the following:
The matrix is 3 x 4:
- x = 3, as we have three recurrent neurons in the recurrent layer
- h = 4, as our vocabulary is 4
The matrix is a 1 x 1 matrix. Let's take its value as 0.35028053. Let's also introduce the bias term b here, which is also a 1 x 1 matrix, 0.6161462. In the next step, we will put these values together and determine the value of . (We will deal with the second equation later.)