A note on diffusion limits for stochastic gradient descent
Alberto Lanconelli, Christopher S. A. Lauria

TL;DR
This paper provides a rigorous theoretical justification for modeling stochastic gradient descent with Gaussian noise, clarifying the origin of its implicit regularization effects in machine learning.
Contribution
It introduces a novel theoretical framework explaining how Gaussian noise naturally emerges in stochastic gradient descent, supporting its use in analysis.
Findings
Gaussian noise in SGD arises naturally from the dynamics.
Theoretical justification for Gaussian approximation in SGD.
Supports the implicit regularization role of noise in SGD.
Abstract
In the machine learning literature stochastic gradient descent has recently been widely discussed for its purported implicit regularization properties. Much of the theory, that attempts to clarify the role of noise in stochastic gradient algorithms, has widely approximated stochastic gradient descent by a stochastic differential equation with Gaussian noise. We provide a novel rigorous theoretical justification for this practice that showcases how the Gaussianity of the noise arises naturally.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Mathematical Biology Tumor Growth
