Rigorous dynamical mean field theory for stochastic gradient descent methods
Cedric Gerbelot, Emanuele Troiani, Francesca Mignacco, Florent Krzakala, Lenka Zdeborova

TL;DR
This paper derives exact high-dimensional asymptotic equations for first-order gradient methods like SGD, connecting them to dynamical mean-field theory from physics, and providing tools for analyzing their behavior on Gaussian data.
Contribution
It introduces a rigorous framework linking gradient descent algorithms to dynamical mean-field theory, including non-separable updates and datasets with non-identity covariance.
Findings
Derived closed-form equations for high-dimensional asymptotics of gradient methods
Connected these equations to dynamical mean-field theory from physics
Provided numerical implementations for SGD with various batch sizes and learning rates
Abstract
We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match those resulting from the discretization of dynamical mean-field theory (DMFT) equations from statistical physics when applied to gradient flow. Our proof method allows us to give an explicit description of how memory kernels build up in the effective dynamics, and to include non-separable update functions, allowing datasets with non-identity covariance matrices. Finally, we provide numerical implementations of the equations for SGD with generic extensive batch-size and with constant learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Markov Chains and Monte Carlo Methods
MethodsStochastic Gradient Descent
