Stationary Behavior of Constant Stepsize SGD Type Algorithms: An   Asymptotic Characterization

Zaiwei Chen; Shancong Mou; and Siva Theja Maguluri

arXiv:2111.06328·cs.LG·November 12, 2021

Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization

Zaiwei Chen, Shancong Mou, and Siva Theja Maguluri

PDF

Open Access

TL;DR

This paper characterizes the asymptotic stationary distribution of constant stepsize SGD algorithms as the stepsize approaches zero, revealing conditions under which it converges to a Gaussian distribution and exploring non-Gaussian behaviors.

Contribution

The work provides a novel asymptotic analysis of the stationary distribution of constant stepsize stochastic approximation algorithms, including explicit characterizations and numerical insights.

Findings

01

Limiting distribution is Gaussian under certain conditions.

02

Scaling factor for the distribution may differ from 1/√α.

03

Numerical experiments suggest non-Gaussian limits beyond classical CLT assumptions.

Abstract

Stochastic approximation (SA) and stochastic gradient descent (SGD) algorithms are work-horses for modern machine learning algorithms. Their constant stepsize variants are preferred in practice due to fast convergence behavior. However, constant step stochastic iterative algorithms do not converge asymptotically to the optimal solution, but instead have a stationary distribution, which in general cannot be analytically characterized. In this work, we study the asymptotic behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero. Specifically, we consider the following three settings: (1) SGD algorithms with smooth and strongly convex objective, (2) linear SA algorithms involving a Hurwitz matrix, and (3) nonlinear SA algorithms involving a contractive operator. When the iterate is scaled by $1/ α$ , where $α$ is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Random Matrices and Applications

MethodsStochastic Gradient Descent