High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models
Zhou Fan, Leda Wang

TL;DR
This paper provides an exact asymptotic analysis of the learning dynamics of multi-pass mini-batch SGD in high-dimensional multi-index models, revealing the effects of batch size and learning rate scaling.
Contribution
It introduces a novel dynamical mean-field framework and scalar Poisson jump process to characterize SGD dynamics in high dimensions, extending existing models.
Findings
SGD dynamics are invariant across batch size scalings within [0,1)
SGD, SME, and gradient flow have distinct dynamics under certain scalings
The analysis recovers known results for gradient flow and online SGD in specific limits
Abstract
We study the learning dynamics of a multi-pass, mini-batch Stochastic Gradient Descent (SGD) procedure for empirical risk minimization in high-dimensional multi-index models with isotropic random data. In an asymptotic regime where the sample size and data dimension increase proportionally, for any sub-linear batch size where , and for a commensurate ``critical'' scaling of the learning rate, we provide an asymptotically exact characterization of the coordinate-wise dynamics of SGD. This characterization takes the form of a system of dynamical mean-field equations, driven by a scalar Poisson jump process that represents the asymptotic limit of SGD sampling noise. We develop an analogous characterization of the Stochastic Modified Equation (SME) which provides a Gaussian diffusion approximation to SGD. Our analyses imply that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Gaussian Processes and Bayesian Inference
