Composing Optimized Stepsize Schedules for Gradient Descent
Benjamin Grimmer, Kevin Shu, Alex L. Wang

TL;DR
This paper develops a comprehensive theory for composing and optimizing stepsize schedules in gradient descent, leading to new schedules that outperform previous methods and potentially achieve minimax optimality.
Contribution
It introduces a general framework for composing stepsize schedules, constructs highly optimized sequences, and extends recent advances to broader settings.
Findings
Constructed optimized stepsize schedules generalizing exponential spacing.
Achieved improved convergence rates matching or surpassing minimax schedules.
Extended dynamic gradient norm minimizing schedules to objective gap minimization.
Abstract
Recent works by Altschuler and Parrilo and the authors have shown that it is possible to accelerate the convergence of gradient descent on smooth convex functions, even without momentum, just by picking special stepsizes. In this paper, we provide a general theory for composing stepsize schedules capturing all recent advances in this area and more. We propose three notions of ``composable'' stepsize schedules with elementary associated composition operations for combining them. From these operations, in addition to recovering recent works, we construct three highly optimized sequences of stepsize schedules. We first construct optimized stepsize schedules of every length generalizing the exponentially spaced silver stepsizes. We then construct highly optimized stepsizes schedules for minimizing final objective gap or gradient norm, improving on prior rates by constants and, more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · Advanced Numerical Analysis Techniques · Computational Geometry and Mesh Generation
