backpropagation, generalization, learning rate, training speed
In gradient descent learning algorithms such as error backpropagation, the learning rate parameter can have a significant effect on generalization accuracy. In particular, decreasing the learning rate below that which yields the fastest convergence can significantly improve generalization accuracy, especially on large, complex problems. The learning rate also directly affects training speed, but not necessarily in the way that many people expect. Many neural network practitioners currently attempt to use the largest learning rate that still allows for convergence, in order to improve training speed. However, a learning rate that is too large can be as slow as a learning rate that is too small, and a learning rate that is too large or too small can require orders of magnitude more training time than one that is in an appropriate range. This paper illustrates how the learning rate affects training speed and generalization accuracy, and thus gives guidelines on how to efficiently select a learning rate that maximizes generalization accuracy.
Original Publication Citation
Wilson, D. R and Martinez, T. R., "The Need for Small Learning Rates on Large Problems", Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'1, pp. 115-119, 21.
BYU ScholarsArchive Citation
Martinez, Tony R. and Wilson, D. Randall, "The Need for Small Learning Rates on Large Problems" (2001). All Faculty Publications. 1093.
Physical and Mathematical Sciences
© 2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Copyright Use Information