Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes

Liu Hanqing; Jianjun Cao; Yuanze Li; Zijian Zhou

arXiv:2605.06152·cs.LG·May 13, 2026

Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes

Liu Hanqing, Jianjun Cao, Yuanze Li, Zijian Zhou

PDF

TL;DR

This paper reveals that loss spikes in deep neural network training, known as the Slingshot Mechanism, are caused by floating-point precision limits leading to a positive feedback loop called Numerical Feature Inflation.

Contribution

It provides a theoretical explanation linking low-precision arithmetic to Slingshot spikes, highlighting a new numerical dynamic in late-stage training.

Findings

01

Loss spikes are triggered by floating-point rounding errors.

02

Numerical Feature Inflation causes exponential growth in classifier and feature means.

03

Partial absorption of logits can still cause parameter divergence without visible loss spikes.

Abstract

Deep neural networks exhibit periodic loss spikes during unregularized long-term training, a phenomenon known as the "Slingshot Mechanism." Existing work usually attributes this to intrinsic optimization dynamics, but its triggering mechanism remains unclear. This paper proves that this phenomenon is a result of floating-point arithmetic precision limits. As training enters a high-confidence stage, the difference between the correct-class logit and the other logits may exceed the absorption-error threshold. Then during backpropagation, the gradient of the correct class is rounded exactly to zero, while the gradients of the incorrect classes remain nonzero. This breaks the zero-sum constraint of gradients across classes and introduces a systematic drift in the parameter update of the classifier layer. We prove that this drift forms a positive feedback loop with the feature, causing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.