Nesterov's Accelerated Gradient and Momentum as approximations to Regularised Update Descent
Aleksandar Botev, Guy Lever, David Barber

TL;DR
This paper introduces a unifying framework for gradient-based optimization, re-derives classical methods, and proposes a new algorithm that converges faster than existing ones.
Contribution
It unifies momentum and Nesterov's methods under a new framework and introduces Regularised Gradient Descent with improved convergence.
Findings
Regularised Gradient Descent converges faster than Nesterov's and classical momentum methods.
Provides a new intuitive interpretation of Nesterov's accelerated gradient.
Unifies existing gradient optimization techniques under a common framework.
Abstract
We present a unifying framework for adapting the update direction in gradient-based iterative optimization methods. As natural special cases we re-derive classical momentum and Nesterov's accelerated gradient method, lending a new intuitive interpretation to the latter algorithm. We show that a new algorithm, which we term Regularised Gradient Descent, can converge more quickly than either Nesterov's algorithm or the classical momentum algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
