Gradient descent with momentum --- to accelerate or to super-accelerate?

Goran Nakerst; John Brennan; Masudul Haque

arXiv:2001.06472·cs.LG·January 20, 2020·5 cites

Gradient descent with momentum --- to accelerate or to super-accelerate?

Goran Nakerst, John Brennan, Masudul Haque

PDF

Open Access

TL;DR

This paper introduces a super-acceleration technique for gradient descent with momentum, which uses multiple steps ahead for gradient evaluation, leading to improved convergence in various machine learning tasks.

Contribution

The paper proposes a novel super-acceleration method extending Nesterov momentum, with an analytically optimal hyperparameter, applicable to both simple and complex loss landscapes.

Findings

01

Super-acceleration improves convergence in quadratic loss functions.

02

Enhanced performance observed on synthetic landscapes and MNIST classification.

03

Method integrates easily with adaptive optimizers like Adam and RMSProp.

Abstract

We consider gradient descent with `momentum', a widely used method for loss function minimization in machine learning. This method is often used with `Nesterov acceleration', meaning that the gradient is evaluated not at the current position in parameter space, but at the estimated position after one step. In this work, we show that the algorithm can be improved by extending this `acceleration' --- by using the gradient at an estimated position several steps ahead rather than just one step ahead. How far one looks ahead in this `super-acceleration' algorithm is determined by a new hyperparameter. Considering a one-parameter quadratic loss function, the optimal value of the super-acceleration can be exactly calculated and analytically estimated. We show explicitly that super-accelerating the momentum algorithm is beneficial, not only for this idealized problem, but also for several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Neural Networks and Applications

MethodsAdam · RMSProp