Optimal convergence rates for Nesterov acceleration

Jean Fran\c{c}ois Aujol (IMB); Charles Dossal (IMT); Aude Rondepierre; (IMT; LAAS-ROC)

arXiv:1805.05719·math.OC·July 9, 2019·SIAM J. Optim.

Optimal convergence rates for Nesterov acceleration

Jean Fran\c{c}ois Aujol (IMB), Charles Dossal (IMT), Aude Rondepierre, (IMT, LAAS-ROC)

PDF

TL;DR

This paper investigates the convergence behavior of Nesterov acceleration, revealing that under certain geometric conditions, improved rates are achievable, and that classical Nesterov schemes may underperform on sharp functions compared to gradient descent.

Contribution

The paper introduces new convergence rates for Nesterov acceleration based on geometrical properties like the Łojasiewicz condition, highlighting limitations of classical schemes.

Findings

01

Better convergence rates are possible with geometric conditions.

02

Classical Nesterov may perform worse than gradient descent on sharp functions.

03

Convergence rates depend on the geometry of the objective function.

Abstract

In this paper, we study the behavior of solutions of the ODE associated to Nesterov acceleration. It is well-known since the pioneering work of Nesterov that the rate of convergence $O (1/ t^{2})$ is optimal for the class of convex functions with Lipschitz gradient. In this work, we show that better convergence rates can be obtained with some additional geometrical conditions, such as \L ojasiewicz property. More precisely, we prove the optimal convergence rates that can be obtained depending on the geometry of the function $F$ to minimize. The convergence rates are new, and they shed new light on the behavior of Nesterov acceleration schemes. We prove in particular that the classical Nesterov scheme may provide convergence rates that are worse than the classical gradient descent scheme on sharp functions: for instance, the convergence rate for strongly convex functions is not geometric for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.