Step-Size Stability in Stochastic Optimization: A Theoretical Perspective
Fabian Schaipp, Robert M. Gower, Adrien Taylor

TL;DR
This paper provides a theoretical analysis of how step size affects stochastic optimization methods, demonstrating that adaptive methods are more robust and aligning theoretical bounds with empirical performance.
Contribution
It introduces a key quantity that measures step-size sensitivity, showing adaptive methods are theoretically more stable than SGD for convex and nonconvex problems.
Findings
Adaptive methods like SPS and NGN are more robust to step size variations.
Theoretical bounds align with empirical performance across problem types.
Step-size sensitivity impacts suboptimality bounds in convex optimization.
Abstract
We present a theoretical analysis of stochastic optimization methods in terms of their sensitivity with respect to the step size. We identify a key quantity that, for each method, describes how the performance degrades as the step size becomes too large. For convex problems, we show that this quantity directly impacts the suboptimality bound of the method. Most importantly, our analysis provides direct theoretical evidence that adaptive step-size methods, such as SPS or NGN, are more robust than SGD. This allows us to quantify the advantage of these adaptive methods beyond empirical evaluation. Finally, we show through experiments that our theoretical bound qualitatively mirrors the actual performance as a function of the step size, even for nonconvex problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Simulation Techniques and Applications · Reinforcement Learning in Robotics
