ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent

Vishwak Srinivasan; Adepu Ravi Sankar; Vineeth N Balasubramanian

arXiv:1712.07424·stat.ML·December 21, 2017·2 cites

ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent

Vishwak Srinivasan, Adepu Ravi Sankar, Vineeth N Balasubramanian

PDF

Open Access

TL;DR

ADINE introduces an adaptive momentum method for stochastic gradient descent that allows momentum parameters greater than one, enabling faster convergence and better saddle escape in deep neural network training.

Contribution

The paper proposes ADINE, a novel momentum-based optimization method that relaxes the traditional momentum constraint, allowing adaptive higher momentum to improve convergence speed.

Findings

01

ADINE accelerates convergence in deep neural networks.

02

Higher momentum ($ extgreater 1$) aids in escaping saddle points.

03

ADINE maintains generalization performance while speeding up training.

Abstract

Two major momentum-based techniques that have achieved tremendous success in optimization are Polyak's heavy ball method and Nesterov's accelerated gradient. A crucial step in all momentum-based methods is the choice of the momentum parameter $m$ which is always suggested to be set to less than $1$ . Although the choice of $m < 1$ is justified only under very strong theoretical assumptions, it works well in practice even when the assumptions do not necessarily hold. In this paper, we propose a new momentum based method $ADINE$ , which relaxes the constraint of $m < 1$ and allows the learning algorithm to use adaptive higher momentum. We motivate our hypothesis on $m$ by experimentally verifying that a higher momentum ( $\geq 1$ ) can help escape saddles much faster. Using this motivation, we propose our method $ADINE$ that helps weigh the previous updates more (by setting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications