Srinivasan, Vishwak and Sankar, Adepu Ravi and Balasubramanian, Vineeth N
(2018)
ADINE: an adaptive momentum method for stochastic gradient descent.
In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 11-13 January 2018, Goa , India.
Full text not available from this repository.
(
Request a copy)
Abstract
Momentum based learning algorithms are one of the most successful learning algorithms in both convex and non-convex optimization. Two major momentum based techniques that achieved tremendous success in gradient-based optimization are Polyak's heavy ball method and Nesterov's accelerated gradient. A crucial step in all the momentum based methods is the choice of the momentum parameter m, which is always set to less than 1. Although the choice of m < 1 is justified only under very strong theoretical assumptions, it works well in practice. In this paper we propose a new momentum based method ADINE, which relaxes the constraint of m < 1 and allows the learning algorithm to use adaptive higher momentum. We motivate our relaxation on m by experimentally verifying that a higher momentum (≥ 1) can help escape saddles much faster. ADINE uses this intuition and helps weigh the previous updates more, inherently setting the momentum parameter to be greater in the optimization method. To the best of our knowledge, the idea of increased momentum is first of its kind and is very novel. We evaluate this on deep neural networks and show that ADINE helps the learning algorithm to converge much faster without compromising on the generalization error.
Actions (login required)
|
View Item |