Gradient descent with momentum --- to accelerate or to super-accelerate?
Goran Nakerst, John Brennan, Masudul Haque

TL;DR
This paper introduces a super-acceleration technique for gradient descent with momentum, which uses multiple steps ahead for gradient evaluation, leading to improved convergence in various machine learning tasks.
Contribution
The paper proposes a novel super-acceleration method extending Nesterov momentum, with an analytically optimal hyperparameter, applicable to both simple and complex loss landscapes.
Findings
Super-acceleration improves convergence in quadratic loss functions.
Enhanced performance observed on synthetic landscapes and MNIST classification.
Method integrates easily with adaptive optimizers like Adam and RMSProp.
Abstract
We consider gradient descent with `momentum', a widely used method for loss function minimization in machine learning. This method is often used with `Nesterov acceleration', meaning that the gradient is evaluated not at the current position in parameter space, but at the estimated position after one step. In this work, we show that the algorithm can be improved by extending this `acceleration' --- by using the gradient at an estimated position several steps ahead rather than just one step ahead. How far one looks ahead in this `super-acceleration' algorithm is determined by a new hyperparameter. Considering a one-parameter quadratic loss function, the optimal value of the super-acceleration can be exactly calculated and analytically estimated. We show explicitly that super-accelerating the momentum algorithm is beneficial, not only for this idealized problem, but also for several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Neural Networks and Applications
MethodsAdam · RMSProp
