From Early Encoding to Late Suppression: Interpreting LLMs on Character Counting Tasks

Ayan Datta; Mounika Marreddy; Alexander Mehler; Zhixue Zhao; Radhika Mamidi

arXiv:2604.00778·cs.CL·April 2, 2026

From Early Encoding to Late Suppression: Interpreting LLMs on Character Counting Tasks

Ayan Datta, Mounika Marreddy, Alexander Mehler, Zhixue Zhao, Radhika Mamidi

PDF

TL;DR

This paper investigates why large language models fail at simple character counting tasks, revealing that internal representations encode correct info but are suppressed at the output layer due to structured interference.

Contribution

It uncovers the mechanism behind symbolic reasoning failures in LLMs, showing they result from internal suppression circuits rather than missing representations or scale.

Findings

01

Models encode character info in early layers but attenuate it in later layers.

02

Negative circuits in later layers suppress correct signals, leading to errors.

03

Symbolic reasoning failures are due to structured interference, not lack of knowledge.

Abstract

Large language models (LLMs) exhibit failures on elementary symbolic tasks such as character counting in a word, despite excelling on complex benchmarks. Although this limitation has been noted, the internal reasons remain unclear. We use character counting (e.g., "How many p's are in apple?") as a minimal, controlled probe that isolates token-level reasoning from higher-level confounds. Using this setting, we uncover a consistent phenomenon across modern architectures, including LLaMA, Qwen, and Gemma: models often compute the correct answer internally yet fail to express it at the output layer. Through mechanistic analysis combining probing classifiers, activation patching, logit lens analysis, and attention head tracing, we show that character-level information is encoded in early and mid-layer representations. However, this information is attenuated by a small set of components in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.