Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization
Zaiwei Chen, Shancong Mou, and Siva Theja Maguluri

TL;DR
This paper characterizes the asymptotic stationary distribution of constant stepsize SGD algorithms as the stepsize approaches zero, revealing conditions under which it converges to a Gaussian distribution and exploring non-Gaussian behaviors.
Contribution
The work provides a novel asymptotic analysis of the stationary distribution of constant stepsize stochastic approximation algorithms, including explicit characterizations and numerical insights.
Findings
Limiting distribution is Gaussian under certain conditions.
Scaling factor for the distribution may differ from 1/√α.
Numerical experiments suggest non-Gaussian limits beyond classical CLT assumptions.
Abstract
Stochastic approximation (SA) and stochastic gradient descent (SGD) algorithms are work-horses for modern machine learning algorithms. Their constant stepsize variants are preferred in practice due to fast convergence behavior. However, constant step stochastic iterative algorithms do not converge asymptotically to the optimal solution, but instead have a stationary distribution, which in general cannot be analytically characterized. In this work, we study the asymptotic behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero. Specifically, we consider the following three settings: (1) SGD algorithms with smooth and strongly convex objective, (2) linear SA algorithms involving a Hurwitz matrix, and (3) nonlinear SA algorithms involving a contractive operator. When the iterate is scaled by , where is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Random Matrices and Applications
MethodsStochastic Gradient Descent
