From SGD to Spectra: A Theory of Neural Network Weight Dynamics

Brian Richard Olsen; Sam Fatehmanesh; Frank Xiao; Adarsh Kumarappan; Anirudh Gajula

arXiv:2507.12709·cs.LG·February 10, 2026

From SGD to Spectra: A Theory of Neural Network Weight Dynamics

Brian Richard Olsen, Sam Fatehmanesh, Frank Xiao, Adarsh Kumarappan, Anirudh Gajula

PDF

Open Access

TL;DR

This paper develops a stochastic differential equation framework to connect the microscopic training dynamics of neural networks with the macroscopic spectral evolution of their weight matrices, providing new theoretical insights.

Contribution

It introduces a rigorous SDE-based model linking SGD dynamics to spectral properties of weights, explaining the observed spectral 'bulk+tail' structure in trained networks.

Findings

01

Squared singular values follow Dyson Brownian motion.

02

Stationary spectral distributions have gamma-type densities.

03

Model predictions match empirical spectral evolution in experiments.

Abstract

Deep neural networks have revolutionized machine learning, yet their training dynamics remain theoretically unclear-we develop a continuous-time, matrix-valued stochastic differential equation (SDE) framework that rigorously connects the microscopic dynamics of SGD to the macroscopic evolution of singular-value spectra in weight matrices. We derive exact SDEs showing that squared singular values follow Dyson Brownian motion with eigenvalue repulsion, and characterize stationary distributions as gamma-type densities with power-law tails, providing the first theoretical explanation for the empirically observed 'bulk+tail' spectral structure in trained networks. Through controlled experiments on transformer and MLP architectures, we validate our theoretical predictions and demonstrate quantitative agreement between SDE-based forecasts and observed spectral evolution, providing a rigorous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications