Accelerating Stochastic Gradient Descent For Least Squares Regression

Prateek Jain; Sham M. Kakade; Rahul Kidambi; Praneeth Netrapalli and; Aaron Sidford

arXiv:1704.08227·stat.ML·August 2, 2018·25 cites

Accelerating Stochastic Gradient Descent For Least Squares Regression

Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli and, Aaron Sidford

PDF

Open Access

TL;DR

This paper demonstrates that accelerated stochastic gradient methods can be made robust for least squares regression, achieving faster minimax optimal statistical risk reduction than standard stochastic gradient descent.

Contribution

It introduces an accelerated stochastic gradient method that is provably robust to statistical errors and faster in convergence for least squares regression.

Findings

01

Achieves minimax optimal statistical risk faster than SGD

02

Provides a sharp characterization of accelerated stochastic gradient descent as a stochastic process

03

Refutes the belief that acceleration cannot be effectively used in stochastic optimization

Abstract

There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014. This work considers these issues for the special case of stochastic approximation for the least squares regression problem, and our main result refutes the conventional wisdom by showing that acceleration can be made robust to statistical errors. In particular, this work introduces an accelerated stochastic gradient method that provably achieves the minimax optimal statistical risk faster than stochastic gradient descent. Critical to the analysis is a sharp characterization of accelerated stochastic gradient descent as a stochastic process. We hope this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data