Stochastic Gradient Descent on Nonconvex Functions with General Noise Models
Vivak Patel, Shushu Zhang

TL;DR
This paper proves that stochastic gradient descent (SGD) either diverges or converges to a stationary point for a broad class of nonconvex functions and noise models, extending theoretical guarantees of SGD's behavior.
Contribution
It establishes convergence properties of SGD under very general nonconvex functions and noise models, relaxing many previous restrictive assumptions.
Findings
SGD iterates diverge or converge to stationary points with probability one.
Gradient norms at SGD iterates tend to zero in probability and expectation under certain conditions.
Broader applicability of SGD with rigorous convergence guarantees for nonconvex problems.
Abstract
Stochastic Gradient Descent (SGD) is a widely deployed optimization procedure throughout data-driven and simulation-driven disciplines, which has drawn a substantial interest in understanding its global behavior across a broad class of nonconvex problems and noise models. Recent analyses of SGD have made noteworthy progress in this direction, and these analyses have innovated important and insightful new strategies for understanding SGD. However, these analyses often have imposed certain restrictions (e.g., convexity, global Lipschitz continuity, uniform Holder continuity, expected smoothness, etc.) that leave room for innovation. In this work, we address this gap by proving that, for a rather general class of nonconvex functions and noise models, SGD's iterates either diverge to infinity or converge to a stationary point with probability one. By further restricting to globally Holder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
