What is the long-run distribution of stochastic gradient descent? A large deviations analysis

Wa\"iss Azizian; Franck Iutzeler; J\'er\^ome Malick; Panayotis Mertikopoulos

arXiv:2406.09241·math.OC·May 19, 2026

What is the long-run distribution of stochastic gradient descent? A large deviations analysis

Wa\"iss Azizian, Franck Iutzeler, J\'er\^ome Malick, Panayotis Mertikopoulos

PDF

1 Video

TL;DR

This paper analyzes the long-term behavior of stochastic gradient descent (SGD) in non-convex optimization, revealing its distribution resembles a thermodynamic system with probabilities influenced by energy levels and noise.

Contribution

It introduces a large deviations framework to characterize the long-run distribution of SGD, connecting it to thermodynamic principles and providing insights into its convergence behavior.

Findings

01

SGD's long-run distribution resembles a Boltzmann-Gibbs distribution.

02

Critical regions are visited exponentially more often than non-critical ones.

03

SGD concentrates around minimum energy states, not necessarily global minima.

Abstract

In this paper, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are more likely to be visited by SGD, and by how much. Using an approach based on the theory of large deviations and randomly perturbed dynamical systems, we show that the long-run distribution of SGD resembles the Boltzmann-Gibbs distribution of equilibrium thermodynamics with temperature equal to the method's step-size and energy levels determined by the problem's objective and the statistics of the noise. In particular, we show that, in the long run, (a) the problem's critical region is visited exponentially more often than any non-critical region; (b) the iterates of SGD are exponentially concentrated around the problem's minimum energy state (which does not always coincide with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

What is the Long-Run Distribution of Stochastic Gradient Descent? A Large Deviations Analysis· slideslive

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Advanced Thermodynamics and Statistical Mechanics · Stochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent