High-dimensional Limit of SGD for Diagonal Linear Networks
Bego\~na Garc\'ia Malaxechebarr\'ia, Courtney Paquette, Maryam Fazel, Dmitriy Drusvyatskiy

TL;DR
This paper analyzes the high-dimensional behavior of stochastic gradient descent on diagonal linear networks, deriving explicit stochastic differential equations and PDEs that describe the dynamics and convergence properties.
Contribution
It introduces a novel continuous dynamics approximation for SGD on diagonal linear networks in high dimensions, including explicit convergence and long-time behavior analysis.
Findings
SGD dynamics are well-approximated by an SDE in high dimensions.
Derived a PDE describing the evolution of observable statistics.
Proved exponential convergence to zero risk with high probability.
Abstract
Understanding the behavior of stochastic gradient methods is a central problem in modern machine learning. Recent work has highlighted diagonal linear networks as a simplified yet expressive setting for analyzing the optimization and generalization properties of neural models. In this work, we show that in the high-dimensional regime, stochastic gradient descent on diagonal linear networks is well-approximated by continuous dynamics governed by a stochastic differential equation (SDE), which explicitly decouples the drift from the gradient noise. We further derive a deterministic partial differential equation whose solution propagates the relevant state of the iterates and characterizes the time evolution of a broad class of observable statistics, including the risk, curvature, and other metrics for optimality. Finally, we show that, under a suitable parametrization, the stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
