Loss Spike in Training Neural Networks
Xiaolong Li, Zhi-Qin John Xu, Zhongwang Zhang

TL;DR
This paper investigates the causes of loss spikes during neural network training, revealing their relation to loss landscape sharpness, frequency components, and weight condensation, and questions the use of maximum Hessian eigenvalue as a generalization measure.
Contribution
It provides a detailed analysis of loss spikes, links them to loss landscape properties, and explores their impact on weight condensation and generalization measures.
Findings
Loss spikes are linked to sharp regions in the loss landscape.
Low-frequency components primarily influence loss descent.
Loss spikes facilitate weight condensation and correlate with maximum Hessian eigenvalue.
Abstract
In this work, we investigate the mechanism underlying loss spikes observed during neural network training. When the training enters a region with a lower-loss-as-sharper (LLAS) structure, the training becomes unstable, and the loss exponentially increases once the loss landscape is too sharp, resulting in the rapid ascent of the loss spike. The training stabilizes when it finds a flat region. From a frequency perspective, we explain the rapid descent in loss as being primarily influenced by low-frequency components. We observe a deviation in the first eigendirection, which can be reasonably explained by the frequency principle, as low-frequency information is captured rapidly, leading to the rapid descent. Inspired by our analysis of loss spikes, we revisit the link between the maximum eigenvalue of the loss Hessian (), flatness and generalization. We suggest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Model Reduction and Neural Networks
MethodsTest
