Acceleration by Stepsize Hedging I: Multi-Step Descent and the Silver   Stepsize Schedule

Jason M. Altschuler; Pablo A. Parrilo

arXiv:2309.07879·math.OC·March 31, 2025·2 cites

Acceleration by Stepsize Hedging I: Multi-Step Descent and the Silver Stepsize Schedule

Jason M. Altschuler, Pablo A. Parrilo

PDF

Open Access

TL;DR

This paper introduces the Silver Stepsize Schedule, a novel stepsize strategy that accelerates gradient descent convergence on convex functions without altering the algorithm, achieving rates between unaccelerated and Nesterov's accelerated methods.

Contribution

The paper proposes a fully explicit, non-monotonic, fractal-like stepsize schedule that improves convergence rates, bridging the gap between unaccelerated and accelerated gradient descent.

Findings

01

Achieves convergence in approximately k^{0.7864} iterations for strongly convex functions.

02

Provides a recursive, explicit construction of the Silver Stepsize Schedule.

03

Suggests the rates are optimal among all stepsize schedules.

Abstract

Can we accelerate convergence of gradient descent without changing the algorithm -- just by carefully choosing stepsizes? Surprisingly, we show that the answer is yes. Our proposed Silver Stepsize Schedule optimizes strongly convex functions in $k^{l o g_{ρ} 2} \approx k^{0.7864}$ iterations, where $ρ = 1 + 2$ is the silver ratio and $k$ is the condition number. This is intermediate between the textbook unaccelerated rate $k$ and the accelerated rate $k$ due to Nesterov in 1983. The non-strongly convex setting is conceptually identical, and standard black-box reductions imply an analogous accelerated rate $ε^{- l o g_{ρ} 2} \approx ε^{- 0.7864}$ . We conjecture and provide partial evidence that these rates are optimal among all possible stepsize schedules. The Silver Stepsize Schedule is constructed recursively in a fully explicit way. It is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Neural Networks and Applications