Universality of high-dimensional scaling limits of stochastic gradient descent
Reza Gheissari, Aukosh Jagannath

TL;DR
This paper demonstrates that the high-dimensional limits of stochastic gradient descent (SGD) are universal across a broad class of data distributions, provided certain conditions are met, and explores conditions where universality does not hold.
Contribution
The paper proves the universality of the ODE limits for SGD in high dimensions across various mixture distributions, extending previous Gaussian-specific results.
Findings
ODE limits are universal for mixtures with matching first two moments.
Universality fails when initialization is coordinate aligned.
Fluctuation limits are not universal across different distributions.
Abstract
We consider statistical tasks in high dimensions whose loss depends on the data only through its projection into a fixed-dimensional subspace spanned by the parameter vectors and certain ground truth vectors. This includes classifying mixture distributions with cross-entropy loss with one and two-layer networks, and learning single and multi-index models with one and two-layer networks. When the data is drawn from an isotropic Gaussian mixture distribution, it is known that the evolution of a finite family of summary statistics under stochastic gradient descent converges to an autonomous ordinary differential equation (ODE), as the dimension and sample size go to and the step size goes to commensurately. Our main result is that these ODE limits are universal in that this limit is the same whenever the data is drawn from mixtures of arbitrary product distributions whose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Statistical Mechanics and Entropy
