From Gradient Flow on Population Loss to Learning with Stochastic   Gradient Descent

Satyen Kale; Jason D. Lee; Chris De Sa; Ayush Sekhari; Karthik; Sridharan

arXiv:2210.06705·cs.LG·October 14, 2022

From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent

Satyen Kale, Jason D. Lee, Chris De Sa, Ayush Sekhari, Karthik, Sridharan

PDF

Open Access 1 Video

TL;DR

This paper establishes a theoretical framework connecting the convergence of Gradient Flow on population loss to the convergence of Stochastic Gradient Descent, providing new insights into when and why SGD succeeds in training complex models.

Contribution

It introduces a general converse Lyapunov theorem linking GF convergence to SGD convergence, applicable to a wide range of non-convex problems including phase retrieval and matrix square-root.

Findings

01

Unified analysis for GD/SGD across classical and complex problems

02

Conditions under which SGD converges based on GF convergence

03

Extension of results to non-convex and structured objectives

Abstract

Stochastic Gradient Descent (SGD) has been the method of choice for learning large-scale non-convex models. While a general analysis of when SGD works has been elusive, there has been a lot of recent progress in understanding the convergence of Gradient Flow (GF) on the population loss, partly due to the simplicity that a continuous-time analysis buys us. An overarching theme of our paper is providing general conditions under which SGD converges, assuming that GF on the population loss converges. Our main tool to establish this connection is a general converse Lyapunov like theorem, which implies the existence of a Lyapunov potential under mild assumptions on the rates of convergence of GF. In fact, using these potentials, we show a one-to-one correspondence between rates of convergence of GF and geometrical properties of the underlying objective. When these potentials further satisfy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Adversarial Robustness in Machine Learning

MethodsStochastic Gradient Descent