Loss functions
To train our model, we must define something called a loss function. The loss function will tell us how well or badly our model is currently doing its job.
Losses can be found in the tf.losses module. For this model, we will use the hinge loss. Hinge loss is the loss function used when creating a support vector machine (SVM). Hinge loss heavily punishes incorrect predictions. For one given example, , where is a feature vector of a datapoint and is its label, the hinge loss for it will be as follows:
To this, the following will apply:
In simple words, this equation takes the raw output of the classifier. In our model, that's three output scores, and ensures that the score of the target class is greater, by at least 1, than the scores of the other classes. For each score (except the target class), if this restriction is satisfied, then 0 is added to the loss, otherwise, there's a penalty that is added:
This concept is actually very intuitive because if our weights and biases are trained properly, then the highest of the three produced scores should confidently indicate the correct class that an input example belongs to.
Since during training we feed many training examples in at once, we'll obtain multiple losses like these that need to be averaged. Therefore, the total loss equation that needs to be minimized is as follows:
In our code, the loss function will take two arguments: logits and labels. In TensorFlow, logits is the name for the raw values produced by our model. In our case, this is model_out as this is the output of our model. For labels, we use our label placeholder, y. Remember that the placeholder will be filled for us at runtime:
loss = tf.reduce_mean(tf.losses.hinge_loss(logits=model_out, labels=y))
As we also want to average our loss across the whole batch of input data, so we use tf.reduce_mean to average all our losses into one loss value that we will minimize.
There are many different types of lossfunctions available for us to use that are all good for different machine learning tasks. As we go through the book, we will learn more of them and when to use different loss functions.