High-dimensional Limit of SGD for Diagonal Linear Networks

Bego\~na Garc\'ia Malaxechebarr\'ia; Courtney Paquette; Maryam Fazel; Dmitriy Drusvyatskiy

arXiv:2605.17177·math.OC·May 19, 2026

High-dimensional Limit of SGD for Diagonal Linear Networks

Bego\~na Garc\'ia Malaxechebarr\'ia, Courtney Paquette, Maryam Fazel, Dmitriy Drusvyatskiy

PDF

TL;DR

This paper analyzes the high-dimensional behavior of stochastic gradient descent on diagonal linear networks, deriving explicit stochastic differential equations and PDEs that describe the dynamics and convergence properties.

Contribution

It introduces a novel continuous dynamics approximation for SGD on diagonal linear networks in high dimensions, including explicit convergence and long-time behavior analysis.

Findings

01

SGD dynamics are well-approximated by an SDE in high dimensions.

02

Derived a PDE describing the evolution of observable statistics.

03

Proved exponential convergence to zero risk with high probability.

Abstract

Understanding the behavior of stochastic gradient methods is a central problem in modern machine learning. Recent work has highlighted diagonal linear networks as a simplified yet expressive setting for analyzing the optimization and generalization properties of neural models. In this work, we show that in the high-dimensional regime, stochastic gradient descent on diagonal linear networks is well-approximated by continuous dynamics governed by a stochastic differential equation (SDE), which explicitly decouples the drift from the gradient noise. We further derive a deterministic partial differential equation whose solution propagates the relevant state of the iterates and characterizes the time evolution of a broad class of observable statistics, including the risk, curvature, and other metrics for optimality. Finally, we show that, under a suitable parametrization, the stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.