Spectral Edge Dynamics of Training Trajectories: Signal--Noise Geometry Across Scales
Yongzhong Xu

TL;DR
This paper introduces Spectral Edge Dynamics (SED), a method to analyze training trajectories of large models by identifying a spectral boundary that separates coherent optimization directions from noise, revealing universal patterns across scales.
Contribution
The paper presents SED, a novel spectral analysis framework that uncovers universal geometric patterns in training trajectories and relates spectral geometry to model complexity and generalization signals.
Findings
Spectral edge exhibits a universal three-phase pattern during training.
Effective signal rank adapts to task complexity, e.g., 2 at 51M and 3 at 124M parameters.
Spectral gap preservation enables analysis of models at arbitrary scale.
Abstract
Despite hundreds of millions of parameters, transformer training trajectories evolve within only a few coherent directions. We introduce Spectral Edge Dynamics (SED) to quantify this structure: a rolling-window SVD of parameter updates reveals a sharp boundary -- the spectral edge -- between coherent optimization directions and stochastic noise, identified via the maximum consecutive singular value ratio . Across a 51M-parameter TinyStories model (4 seeds) and GPT-2 124M under distribution shift, the spectral edge exhibits a universal three-phase pattern (rise, plateau, collapse). The effective signal rank adapts to task complexity ( at 51M, at 124M), and the directional coupling between spectral geometry and validation loss reverses with window size -- a lag flip reflecting the timescale of trajectory integration. Johnson--Lindenstrauss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning in Materials Science · Gaussian Processes and Bayesian Inference
