From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent
Satyen Kale, Jason D. Lee, Chris De Sa, Ayush Sekhari, Karthik, Sridharan

TL;DR
This paper establishes a theoretical framework connecting the convergence of Gradient Flow on population loss to the convergence of Stochastic Gradient Descent, providing new insights into when and why SGD succeeds in training complex models.
Contribution
It introduces a general converse Lyapunov theorem linking GF convergence to SGD convergence, applicable to a wide range of non-convex problems including phase retrieval and matrix square-root.
Findings
Unified analysis for GD/SGD across classical and complex problems
Conditions under which SGD converges based on GF convergence
Extension of results to non-convex and structured objectives
Abstract
Stochastic Gradient Descent (SGD) has been the method of choice for learning large-scale non-convex models. While a general analysis of when SGD works has been elusive, there has been a lot of recent progress in understanding the convergence of Gradient Flow (GF) on the population loss, partly due to the simplicity that a continuous-time analysis buys us. An overarching theme of our paper is providing general conditions under which SGD converges, assuming that GF on the population loss converges. Our main tool to establish this connection is a general converse Lyapunov like theorem, which implies the existence of a Lyapunov potential under mild assumptions on the rates of convergence of GF. In fact, using these potentials, we show a one-to-one correspondence between rates of convergence of GF and geometrical properties of the underlying objective. When these potentials further satisfy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Adversarial Robustness in Machine Learning
MethodsStochastic Gradient Descent
