Better Theory for SGD in the Nonconvex World

Ahmed Khaled; Peter Richt\'arik

arXiv:2002.03329·math.OC·July 27, 2020·60 cites

Better Theory for SGD in the Nonconvex World

Ahmed Khaled, Peter Richt\'arik

PDF

Open Access

TL;DR

This paper advances the theoretical understanding of SGD in nonconvex optimization by introducing a more general assumption, achieving optimal convergence rates, and analyzing various sampling strategies with experimental validation.

Contribution

It proposes a new expected smoothness assumption for SGD, providing the most general and realistic analysis to date, and establishes optimal convergence rates for nonconvex and PL-condition problems.

Findings

01

Achieves the optimal $oxed{ ext{O}( ext{ε}^{-4})}$ rate for nonconvex stationary points.

02

Obtains the optimal $oxed{ ext{O}( ext{ε}^{-1})}$ rate under Polyak-Łojasiewicz condition.

03

Validates theoretical results with experiments on real and synthetic data.

Abstract

Large-scale nonconvex optimization problems are ubiquitous in modern machine learning, and among practitioners interested in solving them, Stochastic Gradient Descent (SGD) reigns supreme. We revisit the analysis of SGD in the nonconvex setting and propose a new variant of the recently introduced expected smoothness assumption which governs the behaviour of the second moment of the stochastic gradient. We show that our assumption is both more general and more reasonable than assumptions made in all prior work. Moreover, our results yield the optimal $O (ε^{- 4})$ rate for finding a stationary point of nonconvex smooth functions, and recover the optimal $O (ε^{- 1})$ rate for finding a global solution if the Polyak-{\L}ojasiewicz condition is satisfied. We compare against convergence rates under convexity and prove a theorem on the convergence of SGD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Complexity and Algorithms in Graphs

MethodsStochastic Gradient Descent