Nesterov's Accelerated Gradient and Momentum as approximations to   Regularised Update Descent

Aleksandar Botev; Guy Lever; David Barber

arXiv:1607.01981·stat.ML·July 12, 2016

Nesterov's Accelerated Gradient and Momentum as approximations to Regularised Update Descent

Aleksandar Botev, Guy Lever, David Barber

PDF

TL;DR

This paper introduces a unifying framework for gradient-based optimization, re-derives classical methods, and proposes a new algorithm that converges faster than existing ones.

Contribution

It unifies momentum and Nesterov's methods under a new framework and introduces Regularised Gradient Descent with improved convergence.

Findings

01

Regularised Gradient Descent converges faster than Nesterov's and classical momentum methods.

02

Provides a new intuitive interpretation of Nesterov's accelerated gradient.

03

Unifies existing gradient optimization techniques under a common framework.

Abstract

We present a unifying framework for adapting the update direction in gradient-based iterative optimization methods. As natural special cases we re-derive classical momentum and Nesterov's accelerated gradient method, lending a new intuitive interpretation to the latter algorithm. We show that a new algorithm, which we term Regularised Gradient Descent, can converge more quickly than either Nesterov's algorithm or the classical momentum algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.