Spectral Edge Dynamics: An Analytical-Empirical Study of Phase Transitions in Neural Network Training

Yongzhong Xu

arXiv:2603.28964·cs.LG·May 8, 2026

Spectral Edge Dynamics: An Analytical-Empirical Study of Phase Transitions in Neural Network Training

Yongzhong Xu

PDF

TL;DR

This paper introduces spectral edge analysis to understand phase transitions in neural network training, linking spectral gaps to learning dynamics and predicting critical points across various models.

Contribution

It develops a novel spectral gap framework that explains neural training phase transitions and provides quantitative predictions validated on multiple models.

Findings

01

Spectral gap dynamics precede all grokking events.

02

Gap position varies with optimizer.

03

Most quantitative predictions are confirmed.

Abstract

We develop the spectral edge analysis: phase transitions in neural network training -- grokking, capability gains, loss plateaus -- are controlled by the spectral gap of the rolling-window Gram matrix of parameter updates. In the extreme aspect ratio regime (parameters $P \sim 1 0^{8}$ , window $W \sim 10$ ), the classical BBP detection threshold is vacuous; the operative structure is the intra-signal gap separating dominant from subdominant modes at position $k^{*} = argmax σ_{j} / σ_{j + 1}$ . From three assumptions we derive: (i) gap dynamics governed by a Dyson-type ODE with curvature asymmetry, damping, and gradient driving; (ii) a spectral loss decomposition linking each mode's learning contribution to its Davis--Kahan stability coefficient; (iii) the Gap Maximality Principle, showing that $k^{*}$ is the unique dynamically privileged position -- its collapse is the only one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.