Hands-On Convolutional Neural Networks with TensorFlow
上QQ阅读APP看书,第一时间看更新

Batches

The idea of having the whole dataset in memory to train networks, as the example in Chapter 1, Setup and Introduction to TensorFlow, is intractable for large datasets. What people do in practice is, during training, they divide the dataset into small pieces, named mini batches (or commonly just batches). Then, in turn, each mini batch is loaded and fed to the network where the backpropagation and gradient descent algorithms will be calculated and weights then updated. This is then repeated for each mini batch until you have gone through the dataset completely.

The gradient calculated for a mini-batch is a noisy estimate of the true gradient of the whole training set, but by repeatedly getting these small noisy updates, we will still eventually converge close enough to a good minimum of the loss function.

Bigger batch sizes give a better estimate of the true gradient. Using a larger batch size will allow for a larger learning rate. The trade-off is that more memory is required to hold this batch while training.

When the model has seen your entire dataset then we say that an epoch has been completed. Due to the stochastic nature of training you will want to train your models for multiple epochs as it is unlikely for your model to have converged in only one epoch.