Learning neural network weights
To understand this section, let us assume that the person in question will eventually and indefinitely be affected by a heart disease, which directly implies that the output of our sigmoid function is 0.
We begin by assigning some random non-zero values to the weights in the equation, as shown in the following diagram:
We do this because we do not really know what the initial value of the weights should be.
We now do what we have learned in the previous section: we move in the forward direction of our network, which is from the input layer to the output layer. We multiply the features with the weights and sum them up before applying them to the sigmoid function. Here is what we obtain as the final output:
The output obtained is 4109, which, when applied to the activation function, gives us the final output of 1, which is the complete opposite of the actual answer that we were looking for.
What do we do to improve the situation? The answer to this question is a backward pass, which means we move through our model from the output layer to the input layer so that during the next forward pass, we can obtain much better results.
To counter this, the neural network will try to vary the values of the weights, as depicted in the following diagram:
It lowers the weight of the age parameter just to make the age add negatively to the equation. Also, it slightly increases the lifestyle because this contributes positively, and for the genes and weights, it applies negative weights.
We do another forward pass, and this time we have a smaller value of 275, but we're still going to achieve an output one from the sigmoid function:
We do a backward pass again and this time we may have to vary the weights even further:
The next time we do a forward pass, the equation produces a negative value, and if we apply this to a sigmoid function, we have a final output of zero:
Comparing 0 to the required value, we realize it's time to stop because the network now knows how to predict.
A forward pass and a backward pass together is called one iteration. In reality, we have 1,000, 100,000, or even millions of these examples, and before we change the weight, we take into account the contribution of each of these examples. Basically, we sum up the contribution of each of these examples, and then change the weights.