A Mechanism Study of Delayed Loss Spikes in Batch-Normalized Linear Models
Peifeng Gao, Wenyi Fang, Yang Zheng, Difan Zou

TL;DR
This paper analyzes how batch normalization can delay the onset of loss spikes in neural network training by examining simplified linear models and deriving explicit conditions for delayed instability.
Contribution
It provides a theoretical analysis of delayed loss spikes in batch-normalized linear models, deriving explicit conditions and mechanisms for delayed instability onset.
Findings
Derived no-rising-edge and delayed-onset conditions for whitened linear regression.
Bound the waiting time to directional onset of instability.
Showed that rising edges self-stabilize within finitely many iterations.
Abstract
Delayed loss spikes have been reported in neural-network training, but existing theory mainly explains earlier non-monotone behavior caused by overly large fixed learning rates. We study one stylized hypothesis: normalization can postpone instability by gradually increasing the effective learning rate during otherwise stable descent. To test this hypothesis at theorem level, we analyze batch-normalized linear models. Our flagship result concerns whitened square-loss linear regression, where we derive explicit no-rising-edge and delayed-onset conditions, bound the waiting time to directional onset, and show that the rising edge self-stabilizes within finitely many iterations. Combined with a square-loss decomposition, this yields a concrete delayed-spike mechanism in the whitened regime. For logistic regression, under highly restrictive active-margin assumptions, we prove only a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
