Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability
Bum Jun Kim, Shohei Taniguchi, Makoto Kawano, Yusuke Iwasawa, Yutaka Matsuo

TL;DR
This paper introduces Residual Koopman Spectral Profiling (RKSP), a method to predict and prevent training divergence in transformers by analyzing spectral features at initialization, significantly improving stability and training efficiency.
Contribution
The paper presents RKSP, a novel spectral diagnostic that predicts transformer instability at initialization and introduces Koopman Spectral Shaping (KSS) to prevent divergence during training.
Findings
RKSP achieves an AUROC of 0.995 in predicting divergence.
KSS reduces divergence rate from 66.7% to 12.5%.
Method generalizes across various models and tasks.
Abstract
Training divergence in transformers wastes compute, yet practitioners discover instability only after expensive runs begin. They therefore need an expected probability of failure for a transformer before training starts. Our study of Residual Koopman Spectral Profiling (RKSP) provides such an estimate. From a single forward pass at initialization, RKSP extracts Koopman spectral features by applying whitened dynamic mode decomposition to layer-wise residual snapshots. Our central diagnostic, the near-unit spectral mass, quantifies the fraction of modes concentrated near the unit circle, which captures instability risk. For predicting divergence across extensive configurations, this estimator achieves an AUROC of 0.995, outperforming the best gradient baseline. We further make this diagnostic actionable through Koopman Spectral Shaping (KSS), which reshapes spectra during training. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Graph Neural Networks · Advanced Neural Network Applications
