Spectral Edge Dynamics: An Analytical-Empirical Study of Phase Transitions in Neural Network Training
Yongzhong Xu

TL;DR
This paper introduces spectral edge analysis to understand phase transitions in neural network training, linking spectral gaps to learning dynamics and predicting critical points across various models.
Contribution
It develops a novel spectral gap framework that explains neural training phase transitions and provides quantitative predictions validated on multiple models.
Findings
Spectral gap dynamics precede all grokking events.
Gap position varies with optimizer.
Most quantitative predictions are confirmed.
Abstract
We develop the spectral edge analysis: phase transitions in neural network training -- grokking, capability gains, loss plateaus -- are controlled by the spectral gap of the rolling-window Gram matrix of parameter updates. In the extreme aspect ratio regime (parameters , window ), the classical BBP detection threshold is vacuous; the operative structure is the intra-signal gap separating dominant from subdominant modes at position . From three assumptions we derive: (i) gap dynamics governed by a Dyson-type ODE with curvature asymmetry, damping, and gradient driving; (ii) a spectral loss decomposition linking each mode's learning contribution to its Davis--Kahan stability coefficient; (iii) the Gap Maximality Principle, showing that is the unique dynamically privileged position -- its collapse is the only one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
