The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior
Laura Gomezjurado Gonzalez

TL;DR
This paper investigates the causes of long delays in transformer models' generalization on algorithmic tasks, highlighting the role of decoder access to learned structure and how numeral base affects learnability.
Contribution
It demonstrates that limited decoder access to learned representations causes delays, and shows how different numeral bases influence the ease of generalization.
Findings
Encoder organizes structure early, but decoder access limits generalization.
Transplanting encoder accelerates learning; retraining only decoder improves accuracy.
Numeral base choice significantly impacts learnability and generalization.
Abstract
Grokking in transformers trained on algorithmic tasks is characterized by a long delay between training-set fit and abrupt generalization, but the source of that delay remains poorly understood. In encoder-decoder arithmetic models, we argue that this delay reflects limited access to already learned structure rather than failure to acquire that structure in the first place. We study one-step Collatz prediction and find that the encoder organizes parity and residue structure within the first few thousand training steps, while output accuracy remains near chance for tens of thousands more. Causal interventions support the decoder bottleneck hypothesis. Transplanting a trained encoder into a fresh model accelerates grokking by 2.75 times, while transplanting a trained decoder actively hurts. Freezing a converged encoder and retraining only the decoder eliminates the plateau entirely and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
