Spectral alignment of stochastic gradient descent for high-dimensional classification tasks
Gerard Ben Arous, Reza Gheissari, Jiaoyang Huang, Aukosh Jagannath

TL;DR
This paper establishes a theoretical connection between stochastic gradient descent dynamics and the spectral properties of Hessian and gradient matrices in high-dimensional neural network classification tasks, revealing layer-wise spectral alignment and evolution.
Contribution
It provides rigorous proofs linking SGD trajectories to the spectra of Hessian and gradient matrices, confirming predictions from numerical studies in overparametrized networks.
Findings
SGD trajectories align with low-dimensional eigenspaces of Hessian and gradient matrices
Outlier eigenspaces evolve during training, especially in the final layer
Alignment and spectral properties are consistent across different network architectures
Abstract
We rigorously study the relation between the training dynamics via stochastic gradient descent (SGD) and the spectra of empirical Hessian and gradient matrices. We prove that in two canonical classification tasks for multi-class high-dimensional mixtures and either 1 or 2-layer neural networks, both the SGD trajectory and emergent outlier eigenspaces of the Hessian and gradient matrices align with a common low-dimensional subspace. Moreover, in multi-layer settings this alignment occurs per layer, with the final layer's outlier eigenspace evolving over the course of training, and exhibiting rank deficiency when the SGD converges to sub-optimal classifiers. This establishes some of the rich predictions that have arisen from extensive numerical studies in the last decade about the spectra of Hessian and information matrices over the course of training in overparametrized networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Statistical Mechanics and Entropy · Neural Networks and Applications
MethodsStochastic Gradient Descent
