TensorFlow 1.x Deep Learning Cookbook
上QQ阅读APP看书,第一时间看更新

How to do it...

We proceed with the recipe as follows:

  1. The first thing we decide is the optimizer that we want. TensorFlow provides you with a wide variety of optimizers. We start with the most popular and simple one, the gradient descent optimizer:
tf.train.GradientDescentOptimizer(learning_rate)
  1. The learning_rate argument to GradientDescentOptimizer can be a constant or tensor. Its value can lie between 0 and 1.
  2. The optimizer must be told about the function to be optimized. This is done using its method, minimize. This method computes the gradients and applies the gradients to learning coefficients. The function as defined in TensorFlow docs is the following:
minimize(
loss,
global_step=None,
var_list=None,
gate_gradients=GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=False,
name=None,
grad_loss=None
)
  1. Combining it all, we define the computational graph:
 ...
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train_step = optimizer.minimize(loss)
...

#Execution Graph
with tf.Session() as sess:
...
sess.run(train_step, feed_dict = {X:X_data, Y:Y_data})
...
  1. The X and Y data fed to feed_dict can be single X and Y points (stochastic gradient), the entire training set (vanilla), or batch.
  1. Another variation in gradient descent is adding the momentum term (we will find out more about this in Chapter 3Neural Networks - Perceptrons). For this, we use the optimizer tf.train.MomentumOptimizer(). It takes both learning_rate and momentum as init arguments:
optimizer = tf.train.MomentumOtimizer(learning_rate=0.01, momentum=0.5).minimize(loss)
  1. We can have an adaptive, monotonically decreasing learning rate if we use tf.train.AdadeltaOptimizer(), which uses two init arguments, learning_rate and decay factor rho:
optimizer = tf.train.AdadeltaOptimizer(learning_rate=0.8, rho=0.95).minimize(loss)
  1. TensorFlow also supports Hinton's RMSprop, which works similarly to Adadelta--tf.train.RMSpropOptimizer():
optimizer = tf.train.RMSpropOptimizer(learning_rate=0.01, decay=0.8, momentum=0.1).minimize(loss)

There are some fine differences between Adadelta and RMSprop. To find out more about them, you can refer to http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf and https://arxiv.org/pdf/1212.5701.pdf.

  1. Another popular optimizer supported by TensorFlow is the Adam optimizer. The method computes inpidual adaptive learning rates for the different coefficients using the estimates of the first and second moments of gradients:
 optimizer = tf.train.AdamOptimizer().minimize(loss)
  1. Besides these, TensorFlow also provides the following optimizers:
tf.train.AdagradOptimizer  #Adagrad Optimizer
tf.train.AdagradDAOptimizer #Adagrad Dual Averaging optimizer
tf.train.FtrlOptimizer #Follow the regularized leader optimizer
tf.train.ProximalGradientDescentOptimizer #Proximal GD optimizer
tf.train.ProximalAdagradOptimizer # Proximal Adagrad optimizer