Batches
The idea of having the whole dataset in memory to train networks, as the example in Chapter 1, Setup and Introduction to TensorFlow, is intractable for large datasets. What people do in practice is, during training, they divide the dataset into small pieces, named mini batches (or commonly just batches). Then, in turn, each mini batch is loaded and fed to the network where the backpropagation and gradient descent algorithms will be calculated and weights then updated. This is then repeated for each mini batch until you have gone through the dataset completely.
The gradient calculated for a mini-batch is a noisy estimate of the true gradient of the whole training set, but by repeatedly getting these small noisy updates, we will still eventually converge close enough to a good minimum of the loss function.
Bigger batch sizes give a better estimate of the true gradient. Using a larger batch size will allow for a larger learning rate. The trade-off is that more memory is required to hold this batch while training.