Unified Optimal Analysis of the (Stochastic) Gradient Method
Sebastian U. Stich

TL;DR
This paper provides a unified analysis of stochastic and deterministic gradient methods, establishing convergence rates under mild smoothness assumptions and matching the best known iteration complexities.
Contribution
It offers a simple proof for convergence of SGD under milder smoothness conditions and unifies the analysis for both stochastic and deterministic gradient methods.
Findings
Convergence rate for SGD with $ ilde{O}(LR^2 e^{-rac{}{4L}T} + rac{\sigma^2}{ T})$.
Recovery of exponential convergence in the interpolation setting where $\sigma^2=0$.
Match with best known iteration complexity bounds for GD and SGD.
Abstract
In this note we give a simple proof for the convergence of stochastic gradient (SGD) methods on -convex functions under a (milder than standard) -smoothness assumption. We show that for carefully chosen stepsizes SGD converges after iterations as where measures the variance in the stochastic noise. For deterministic gradient descent (GD) and SGD in the interpolation setting we have and we recover the exponential convergence rate. The bound matches with the best known iteration complexity of GD and SGD, up to constants.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Numerical methods in inverse problems
MethodsStochastic Gradient Descent
