Better Theory for SGD in the Nonconvex World
Ahmed Khaled, Peter Richt\'arik

TL;DR
This paper advances the theoretical understanding of SGD in nonconvex optimization by introducing a more general assumption, achieving optimal convergence rates, and analyzing various sampling strategies with experimental validation.
Contribution
It proposes a new expected smoothness assumption for SGD, providing the most general and realistic analysis to date, and establishes optimal convergence rates for nonconvex and PL-condition problems.
Findings
Achieves the optimal $oxed{ ext{O}( ext{ε}^{-4})}$ rate for nonconvex stationary points.
Obtains the optimal $oxed{ ext{O}( ext{ε}^{-1})}$ rate under Polyak-Łojasiewicz condition.
Validates theoretical results with experiments on real and synthetic data.
Abstract
Large-scale nonconvex optimization problems are ubiquitous in modern machine learning, and among practitioners interested in solving them, Stochastic Gradient Descent (SGD) reigns supreme. We revisit the analysis of SGD in the nonconvex setting and propose a new variant of the recently introduced expected smoothness assumption which governs the behaviour of the second moment of the stochastic gradient. We show that our assumption is both more general and more reasonable than assumptions made in all prior work. Moreover, our results yield the optimal rate for finding a stationary point of nonconvex smooth functions, and recover the optimal rate for finding a global solution if the Polyak-{\L}ojasiewicz condition is satisfied. We compare against convergence rates under convexity and prove a theorem on the convergence of SGD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Complexity and Algorithms in Graphs
MethodsStochastic Gradient Descent
