Aiming towards the minimizers: fast convergence of SGD for   overparametrized problems

Chaoyue Liu; Dmitriy Drusvyatskiy; Mikhail Belkin; Damek Davis; Yi-An; Ma

arXiv:2306.02601·cs.LG·June 6, 2023·1 cites

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Chaoyue Liu, Dmitriy Drusvyatskiy, Mikhail Belkin, Damek Davis, Yi-An, Ma

PDF

Open Access

TL;DR

This paper introduces a regularity condition in overparameterized machine learning models that enables stochastic gradient descent to achieve convergence rates comparable to deterministic methods, even with minimal gradient samples.

Contribution

The paper presents a novel regularity condition that ensures fast convergence of SGD in the interpolation regime, matching deterministic gradient methods' efficiency.

Findings

01

The regularity condition holds for sufficiently wide neural networks.

02

SGD achieves the same worst-case iteration complexity as deterministic methods under this condition.

03

The approach allows using only a single sample or minibatch per iteration without sacrificing convergence speed.

Abstract

Modern machine learning paradigms, such as deep learning, occur in or close to the interpolation regime, wherein the number of model parameters is much larger than the number of data samples. In this work, we propose a regularity condition within the interpolation regime which endows the stochastic gradient method with the same worst-case iteration complexity as the deterministic gradient method, while using only a single sampled gradient (or a minibatch) in each iteration. In contrast, all existing guarantees require the stochastic gradient method to take small steps, thereby resulting in a much slower linear rate of convergence. Finally, we demonstrate that our condition holds when training sufficiently wide feedforward neural networks with a linear output layer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM