Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics
Igor Ignashin, Anna Radovskaya, Andrew Semenov, Egor Lopatin, Stanislav Potapov, Aleksandr Kovalenko, Andrey Veprikov, Aleksandr Shestakov, Andrey Leonidov, Aleksandr Beznosikov

TL;DR
This paper challenges the common modeling of SGD as Brownian motion, proposing a discrete dynamics framework that reveals different behaviors near critical points, with empirical validation on neural networks.
Contribution
It introduces a new discrete formulation of SGD dynamics that diverges from Langevin approximation and analyzes its behavior near critical points.
Findings
Variance grows over time in nearly-flat directions, indicating diffusion along valleys.
Empirical evidence shows a separation between confined and diffusive modes in neural networks.
The derived equations differ from standard Langevin form at order eta^2.
Abstract
Stochastic Gradient Descent (SGD) is commonly modeled as a Langevin process, assuming that minibatch noise acts as Brownian motion. However, this approximation relies on a continuous-time limit and a sqrt(eta) noise scaling that does not match the discrete SGD update at finite learning rate. In this work, we propose an alternative formulation of SGD as deterministic dynamics in a fluctuating loss landscape induced by minibatch sampling. Starting directly from the discrete update, we derive a master equation for the parameter distribution and obtain a discrete Fokker--Planck equation that differs from the standard Langevin form at order eta^2. Using this framework, we analyze SGD dynamics near critical points of the loss. We show that the behavior decomposes along the eigenbasis of the mean Hessian into qualitatively distinct regimes. In particular, nearly-flat directions do not admit a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
