A Markov Chain Theory Approach to Characterizing the Minimax Optimality   of Stochastic Gradient Descent (for Least Squares)

Prateek Jain; Sham M. Kakade; Rahul Kidambi; Praneeth Netrapalli,; Venkata Krishna Pillutla; Aaron Sidford

arXiv:1710.09430·stat.ML·July 24, 2018

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli,, Venkata Krishna Pillutla, Aaron Sidford

PDF

TL;DR

This paper offers a simplified proof of the minimax optimality of averaged SGD for least squares by analyzing it as a stochastic process and characterizing its stationary covariance, including constant factors and model mis-specification.

Contribution

It provides a new, simplified proof of the minimax optimality of averaged SGD for least squares through stochastic process analysis and covariance characterization.

Findings

01

SGD is statistically minimax optimal for least squares.

02

Stationary covariance matrix of SGD is sharply characterized.

03

Results include constant factors and model mis-specification considerations.

Abstract

This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addresses model mis-specification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent