The Right Answer, the Wrong Direction: Why Transformers Fail at Counting and How to Fix It
Gabriel Garcia

TL;DR
This paper investigates why large language models fail at counting tasks, revealing that the internal count representations are present but not properly aligned with output tokens, and proposes fixes to improve counting accuracy.
Contribution
The study identifies counting failure as a readout bottleneck in transformers and demonstrates effective interventions like output head updates and attention LoRA to improve counting performance.
Findings
Linear probes show counts are stored internally with high accuracy.
Updating output head digit rows improves constrained digit prediction.
LoRA on attention improves autoregressive counting accuracy.
Abstract
Large language models often fail at simple counting tasks, even when items to count are in the prompt. We investigate whether this failure occurs because transformers do not represent counts internally, or because they cannot convert representations to the correct output tokens. Across three model families: Pythia, Qwen3, and Mistral, ranging from 0.4B to 14B parameters, we find evidence for the second explanation. Linear probes recover the correct count from intermediate layers with , showing that the information is present. However, the internal directions that encode counts are nearly orthogonal to digit-token output-head rows (). In other words, the model stores the count in a form that the digit logits do not naturally read out. We localize this failure with two interventions. Updating only the digit rows of the output head (36,864 parameters)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
