On uniform-in-time diffusion approximation for stochastic gradient   descent

Lei Li; Yuliang Wang

arXiv:2207.04922·stat.ML·July 12, 2022

On uniform-in-time diffusion approximation for stochastic gradient descent

Lei Li, Yuliang Wang

PDF

Open Access

TL;DR

This paper proves that stochastic gradient descent (SGD) can be approximated uniformly over time by a diffusion process under mild conditions, enabling long-term analysis of SGD even without convexity of individual loss functions.

Contribution

It establishes the first uniform-in-time diffusion approximation for SGD under weak assumptions, extending the analysis beyond finite time intervals.

Findings

01

Uniform-in-time diffusion approximation for SGD is proven.

02

Exponential decay rates of derivatives are key to the analysis.

03

Allows long-term study of SGD dynamics without convex individual losses.

Abstract

The diffusion approximation of stochastic gradient descent (SGD) in current literature is only valid on a finite time interval. In this paper, we establish the uniform-in-time diffusion approximation of SGD, by only assuming that the expected loss is strongly convex and some other mild conditions, without assuming the convexity of each random loss function. The main technique is to establish the exponential decay rates of the derivatives of the solution to the backward Kolmogorov equation. The uniform-in-time approximation allows us to study asymptotic behaviors of SGD via the continuous stochastic differential equation (SDE) even when the random objective function $f (\cdot; ξ)$ is not strongly convex.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematical Biology Tumor Growth · Stochastic processes and financial applications · Markov Chains and Monte Carlo Methods

MethodsDiffusion · Exponential Decay · Stochastic Gradient Descent