The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares
Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

TL;DR
This paper investigates the final iterate behavior of SGD in streaming least squares regression, showing that geometrically decaying step sizes significantly improve convergence rates over polynomial decay, approaching minimax optimality.
Contribution
It introduces the step decay schedule for SGD, demonstrating its near-optimal convergence for the final iterate in streaming least squares problems, outperforming polynomial decay schemes.
Findings
Step decay schedules achieve near minimax optimal rates.
Polynomial decay step sizes are sub-optimal for final iterate convergence.
Anytime behavior of SGD's final iterate is poor regardless of step size.
Abstract
Minimax optimal convergence rates for classes of stochastic convex optimization problems are well characterized, where the majority of results utilize iterate averaged stochastic gradient descent (SGD) with polynomially decaying step sizes. In contrast, SGD's final iterate behavior has received much less attention despite their widespread use in practice. Motivated by this observation, this work provides a detailed study of the following question: what rate is achievable using the final iterate of SGD for the streaming least squares regression problem with and without strong convexity? First, this work shows that even if the time horizon T (i.e. the number of iterations SGD is run for) is known in advance, SGD's final iterate behavior with any polynomially decaying learning rate scheme is highly sub-optimal compared to the minimax rate (by a condition number factor in the strongly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
MethodsStep Decay · Stochastic Gradient Descent
