ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent
Vishwak Srinivasan, Adepu Ravi Sankar, Vineeth N Balasubramanian

TL;DR
ADINE introduces an adaptive momentum method for stochastic gradient descent that allows momentum parameters greater than one, enabling faster convergence and better saddle escape in deep neural network training.
Contribution
The paper proposes ADINE, a novel momentum-based optimization method that relaxes the traditional momentum constraint, allowing adaptive higher momentum to improve convergence speed.
Findings
ADINE accelerates convergence in deep neural networks.
Higher momentum ($ extgreater 1$) aids in escaping saddle points.
ADINE maintains generalization performance while speeding up training.
Abstract
Two major momentum-based techniques that have achieved tremendous success in optimization are Polyak's heavy ball method and Nesterov's accelerated gradient. A crucial step in all momentum-based methods is the choice of the momentum parameter which is always suggested to be set to less than . Although the choice of is justified only under very strong theoretical assumptions, it works well in practice even when the assumptions do not necessarily hold. In this paper, we propose a new momentum based method , which relaxes the constraint of and allows the learning algorithm to use adaptive higher momentum. We motivate our hypothesis on by experimentally verifying that a higher momentum () can help escape saddles much faster. Using this motivation, we propose our method that helps weigh the previous updates more (by setting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications
