TL;DR
This paper derives bias-optimal convergence bounds for SGD using a Lyapunov analysis and the Performance Estimation Problem framework, applicable across a wide range of step-sizes without additional variance assumptions.
Contribution
It introduces a novel Lyapunov-based analysis for SGD that achieves bias-optimal bounds matching deterministic gradient descent rates.
Findings
Bounds are valid for all constant step-sizes in (0,2)
Lyapunov energy construction yields sharp convergence guarantees
Numerical evidence supports the optimality of the variance terms
Abstract
The non-asymptotic analysis of Stochastic Gradient Descent (SGD) typically yields bounds that decompose into a bias term and a variance term. In this work, we focus on the bias component and study the extent to which SGD can match the optimal convergence behavior of deterministic gradient descent. Assuming only (strong) convexity and smoothness of the objective, we derive new bounds that are bias-optimal, in the sense that the bias term coincides with the worst-case rate of gradient descent. Our results hold for the full range of constant step-sizes , including critical and large step-size regimes that were previously unexplored without additional variance assumptions. The bounds are obtained through the construction of a simple Lyapunov energy whose monotonicity yields sharp convergence guarantees. To design the parameters of this energy, we employ the Performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
