Accelerated Gradient Descent via Long Steps
Benjamin Grimmer, Kevin Shu, Alex L. Wang

TL;DR
This paper proves the first accelerated convergence rate for gradient descent in smooth convex optimization by using a nonconstant sequence of increasing step sizes, surpassing the traditional $O(1/T)$ rate.
Contribution
It establishes a new $O(1/T^{1.0564})$ convergence rate for gradient descent with long, nonperiodic steps, advancing the understanding of acceleration in convex optimization.
Findings
Proves a $O(1/T^{1.0564})$ convergence rate for smooth convex minimization.
Shows that long, increasing step sizes can accelerate gradient descent.
Extends the theory to strongly convex optimization with similar acceleration results.
Abstract
Recently Grimmer [1] showed for smooth convex optimization by utilizing longer steps periodically, gradient descent's textbook convergence guarantees can be improved by constant factors, conjecturing an accelerated rate strictly faster than could be possible. Here we prove such a big-O gain, establishing gradient descent's first accelerated convergence rate in this setting. Namely, we prove a rate for smooth convex minimization by utilizing a nonconstant nonperiodic sequence of increasingly large stepsizes. It remains open if one can achieve the rate conjectured by Das Gupta et. al. [2] or the optimal gradient method rate of . Big-O convergence rate accelerations from long steps follow from our theory for strongly convex optimization, similar to but somewhat weaker than those concurrently developed by Altschuler and Parrilo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research
