From Early Encoding to Late Suppression: Interpreting LLMs on Character Counting Tasks
Ayan Datta, Mounika Marreddy, Alexander Mehler, Zhixue Zhao, Radhika Mamidi

TL;DR
This paper investigates why large language models fail at simple character counting tasks, revealing that internal representations encode correct info but are suppressed at the output layer due to structured interference.
Contribution
It uncovers the mechanism behind symbolic reasoning failures in LLMs, showing they result from internal suppression circuits rather than missing representations or scale.
Findings
Models encode character info in early layers but attenuate it in later layers.
Negative circuits in later layers suppress correct signals, leading to errors.
Symbolic reasoning failures are due to structured interference, not lack of knowledge.
Abstract
Large language models (LLMs) exhibit failures on elementary symbolic tasks such as character counting in a word, despite excelling on complex benchmarks. Although this limitation has been noted, the internal reasons remain unclear. We use character counting (e.g., "How many p's are in apple?") as a minimal, controlled probe that isolates token-level reasoning from higher-level confounds. Using this setting, we uncover a consistent phenomenon across modern architectures, including LLaMA, Qwen, and Gemma: models often compute the correct answer internally yet fail to express it at the output layer. Through mechanistic analysis combining probing classifiers, activation patching, logit lens analysis, and attention head tracing, we show that character-level information is encoded in early and mid-layer representations. However, this information is attenuated by a small set of components in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
