Stochastic Differential Equations models for Least-Squares Stochastic   Gradient Descent

Adrien Schertzer; Loucas Pillaud-Vivien

arXiv:2407.02322·cs.LG·July 3, 2024

Stochastic Differential Equations models for Least-Squares Stochastic Gradient Descent

Adrien Schertzer, Loucas Pillaud-Vivien

PDF

Open Access

TL;DR

This paper models the dynamics of stochastic gradient descent for least-squares problems using stochastic differential equations, providing convergence rates, distribution characterizations, and insights into heavy-tail phenomena.

Contribution

It introduces a continuous-time SDE framework for analyzing SGD in least-squares, extending previous work and offering detailed convergence and distribution results.

Findings

01

Non-asymptotic convergence rates to stationary distribution

02

Characterization of the asymptotic distribution including mean and deviations

03

Identification of heavy-tail emergence related to step-size

Abstract

We study the dynamics of a continuous-time model of the Stochastic Gradient Descent (SGD) for the least-square problem. Indeed, pursuing the work of Li et al. (2019), we analyze Stochastic Differential Equations (SDEs) that model SGD either in the case of the training loss (finite samples) or the population one (online setting). A key qualitative feature of the dynamics is the existence of a perfect interpolator of the data, irrespective of the sample size. In both scenarios, we provide precise, non-asymptotic rates of convergence to the (possibly degenerate) stationary distribution. Additionally, we describe this asymptotic distribution, offering estimates of its mean, deviations from it, and a proof of the emergence of heavy-tails related to the step-size magnitude. Numerical simulations supporting our findings are also presented.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Numerical methods in inverse problems · Advanced Optimization Algorithms Research

MethodsStochastic Gradient Descent