Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
Sharan Vaswani, Aaron Mishkin, Issam Laradji, Mark Schmidt, Gauthier, Gidel, Simon Lacoste-Julien

TL;DR
This paper introduces line-search techniques for stochastic gradient descent (SGD) that automatically determine optimal step-sizes, achieving fast convergence rates in interpolation settings for convex, non-convex, and saddle-point problems, with practical benefits demonstrated on classification tasks.
Contribution
It proposes stochastic Armijo line-search methods for SGD that adaptively set step-sizes, ensuring convergence rates comparable to full-batch gradient descent under interpolation conditions.
Findings
SGD with Armijo line-search attains deterministic convergence rates for convex functions.
The methods are robust to hyper-parameter choices and improve practical convergence.
Experiments show faster convergence and better generalization on classification tasks.
Abstract
Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques to automatically set the step-size when training models that can interpolate the data. In the interpolation setting, we prove that SGD with a stochastic variant of the classic Armijo line-search attains the deterministic convergence rates for both convex and strongly-convex functions. Under additional assumptions, SGD with Armijo line-search is shown to achieve fast convergence for non-convex functions. Furthermore, we show that stochastic extra-gradient with a Lipschitz line-search attains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
