Aiming towards the minimizers: fast convergence of SGD for overparametrized problems
Chaoyue Liu, Dmitriy Drusvyatskiy, Mikhail Belkin, Damek Davis, Yi-An, Ma

TL;DR
This paper introduces a regularity condition in overparameterized machine learning models that enables stochastic gradient descent to achieve convergence rates comparable to deterministic methods, even with minimal gradient samples.
Contribution
The paper presents a novel regularity condition that ensures fast convergence of SGD in the interpolation regime, matching deterministic gradient methods' efficiency.
Findings
The regularity condition holds for sufficiently wide neural networks.
SGD achieves the same worst-case iteration complexity as deterministic methods under this condition.
The approach allows using only a single sample or minibatch per iteration without sacrificing convergence speed.
Abstract
Modern machine learning paradigms, such as deep learning, occur in or close to the interpolation regime, wherein the number of model parameters is much larger than the number of data samples. In this work, we propose a regularity condition within the interpolation regime which endows the stochastic gradient method with the same worst-case iteration complexity as the deterministic gradient method, while using only a single sampled gradient (or a minibatch) in each iteration. In contrast, all existing guarantees require the stochastic gradient method to take small steps, thereby resulting in a much slower linear rate of convergence. Finally, we demonstrate that our condition holds when training sufficiently wide feedforward neural networks with a linear output layer.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM
