Painless Stochastic Gradient: Interpolation, Line-Search, and   Convergence Rates

Sharan Vaswani; Aaron Mishkin; Issam Laradji; Mark Schmidt; Gauthier; Gidel; Simon Lacoste-Julien

arXiv:1905.09997·cs.LG·June 7, 2021·24 cites

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Sharan Vaswani, Aaron Mishkin, Issam Laradji, Mark Schmidt, Gauthier, Gidel, Simon Lacoste-Julien

PDF

Open Access 1 Repo

TL;DR

This paper introduces line-search techniques for stochastic gradient descent (SGD) that automatically determine optimal step-sizes, achieving fast convergence rates in interpolation settings for convex, non-convex, and saddle-point problems, with practical benefits demonstrated on classification tasks.

Contribution

It proposes stochastic Armijo line-search methods for SGD that adaptively set step-sizes, ensuring convergence rates comparable to full-batch gradient descent under interpolation conditions.

Findings

01

SGD with Armijo line-search attains deterministic convergence rates for convex functions.

02

The methods are robust to hyper-parameter choices and improve practical convergence.

03

Experiments show faster convergence and better generalization on classification tasks.

Abstract

Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques to automatically set the step-size when training models that can interpolate the data. In the interpolation setting, we prove that SGD with a stochastic variant of the classic Armijo line-search attains the deterministic convergence rates for both convex and strongly-convex functions. Under additional assumptions, SGD with Armijo line-search is shown to achieve fast convergence for non-convex functions. Furthermore, we show that stochastic extra-gradient with a Lipschitz line-search attains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IssamLaradji/sls
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent