How To Make the Gradients Small Stochastically: Even Faster Convex and   Nonconvex SGD

Zeyuan Allen-Zhu

arXiv:1801.02982·cs.LG·July 30, 2021·39 cites

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Zeyuan Allen-Zhu

PDF

Open Access

TL;DR

This paper introduces new stochastic gradient algorithms, SGD3 and SGD5, that achieve faster convergence rates for making gradients small in convex and nonconvex optimization, surpassing previous methods.

Contribution

The paper presents two novel algorithms, SGD3 and SGD5, with near-optimal convergence rates for gradient norm reduction in convex and nonconvex settings, improving upon prior work.

Findings

01

SGD3 achieves rate () for convex objectives.

02

SGD5 achieves rate () for nonconvex objectives.

03

Both algorithms match or improve upon the best known stochastic methods.

Abstract

Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives $f (x)$ . However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when $f (x)$ is convex. If $f (x)$ is convex, to find a point with gradient norm $ε$ , we design an algorithm SGD3 with a near-optimal rate $\tilde{O} (ε^{- 2})$ , improving the best known rate $O (ε^{- 8/3})$ of [18]. If $f (x)$ is nonconvex, to find its $ε$ -approximate local minimum, we design an algorithm SGD5 with rate $\tilde{O} (ε^{- 3.5})$ , where previously SGD variants only achieve $\tilde{O} (ε^{- 4})$ [6, 15, 33]. This is no slower than the best known stochastic version of Newton's method in all parameter regimes [30].

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Image Processing Techniques

MethodsStochastic Gradient Descent