A Differential Equation for Modeling Nesterov's Accelerated Gradient   Method: Theory and Insights

Weijie Su; Stephen Boyd; Emmanuel J. Candes

arXiv:1503.01243·stat.ML·October 29, 2015·J. Mach. Learn. Res.·545 cites

A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights

Weijie Su, Stephen Boyd, Emmanuel J. Candes

PDF

Open Access

TL;DR

This paper derives a second-order differential equation as a limit of Nesterov's accelerated gradient method, providing new insights into its behavior and enabling the development of improved algorithms with proven convergence properties.

Contribution

It introduces a differential equation model for Nesterov's method, offering a novel analytical framework and new algorithms with linear convergence in strongly convex settings.

Findings

01

The ODE closely approximates Nesterov's scheme.

02

The ODE-based approach yields algorithms with linear convergence.

03

Restarting Nesterov's method improves convergence guarantees.

Abstract

We derive a second-order ordinary differential equation (ODE) which is the limit of Nesterov's accelerated gradient method. This ODE exhibits approximate equivalence to Nesterov's scheme and thus can serve as a tool for analysis. We show that the continuous time ODE allows for a better understanding of Nesterov's scheme. As a byproduct, we obtain a family of schemes with similar convergence rates. The ODE interpretation also suggests restarting Nesterov's scheme leading to an algorithm, which can be rigorously proven to converge at a linear rate whenever the objective is strongly convex.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research