A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)
Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli,, Venkata Krishna Pillutla, Aaron Sidford

TL;DR
This paper offers a simplified proof of the minimax optimality of averaged SGD for least squares by analyzing it as a stochastic process and characterizing its stationary covariance, including constant factors and model mis-specification.
Contribution
It provides a new, simplified proof of the minimax optimality of averaged SGD for least squares through stochastic process analysis and covariance characterization.
Findings
SGD is statistically minimax optimal for least squares.
Stationary covariance matrix of SGD is sharply characterized.
Results include constant factors and model mis-specification considerations.
Abstract
This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addresses model mis-specification.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent
