本章参考文献
[1] ROSENBLATT F. The perceptron: a probabilistic model for information storage and organization in the brain[J]. Psychological Review, 1958, 65: 386-408.
[2] KRIZHEVSKY A, SUTSKEVER I, HINTON G. Imagenet classification with deep convolutional neural networks[C]. New York: NIPS, 2012, 1097-1105.
[3] GOODFELLOW I, WARDE-FARLEY D, MIRZA M, et al. Maxout networks[J]. International Conference on Machine Learning, 2013, 1319-1327.
[4] LECUN Y, BOSER B, DENKER J S, et al.. Backpropagation applied to handwritten zip code[J]. Neural Computation, 1989.
[5] ELMAN J L. Finding structure in time[J].Cognitive Science Society Annual Conference, 1990, 14(2): 179-211.
[6] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[7] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[J]. Journal of Machine Learning Research, 2010, 9: 249-256.
[8] HE K, ZHANG X, REN S, et al. Delving deep into rectifiers:surpassing human-level performance on imagenet classification[J]. The IEEE International Conference on Computer Vision, 2015, 1026-1034.
[9] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(6088): 533-536.
[10] POLYAK B T. Some methods of speeding up the convergence of iteration methods[J]. Ussr Computational Mathematics & Mathematical Physics,1964,4(5):1-17.
[11] NESTEROV Y. A method of solving a convex programming problem with convergence rate O (1/k2)[C]. Soviet Mathematics Doklady,1983, 269: 543-547.
[12] DUCHI J, HAZAN E, SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. Journal of Machine Learning Research, 2011, 12(7): 257-269.
[13] KINGMA D P, BA J. Adam: a method for stochastic optimization[J]. International Conference on Learning Representations, 2015, 1-13.
[14] SHALLUE C J, LEE J, ANTOGNINI J, et al. Measuring the effects of data parallelism on neural network training[J]. Journal of Machine Learning Research, 2018, 20: 1-49.
[15] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[J]. International Conference on Machine Learning, 2015.