When Models Manipulate Manifolds: The Geometry of a Counting Task
Wes Gurnee, Emmanuel Ameisen, Isaac Kauvar, Julius Tarng, Adam Pearce, Chris Olah, Joshua Batson

TL;DR
This paper investigates how language models, specifically Claude 3.5 Haiku, internally represent and manipulate visual text properties like linebreaking through geometric transformations of low-dimensional manifolds, revealing complex interpretability mechanisms.
Contribution
It uncovers the geometric and feature-based processes underlying linebreaking in language models, highlighting the role of low-dimensional manifolds and attention in decision-making.
Findings
Character counts are represented on low-dimensional curved manifolds.
Attention heads twist manifolds to estimate distances for linebreaking.
Visual illusions can hijack the counting mechanism in models.
Abstract
Language models can perceive visual properties of text despite receiving only sequences of tokens-we mechanistically investigate how Claude 3.5 Haiku accomplishes one such task: linebreaking in fixed-width text. We find that character counts are represented on low-dimensional curved manifolds discretized by sparse feature families, analogous to biological place cells. Accurate predictions emerge from a sequence of geometric transformations: token lengths are accumulated into character count manifolds, attention heads twist these manifolds to estimate distance to the line boundary, and the decision to break the line is enabled by arranging estimates orthogonally to create a linear decision boundary. We validate our findings through causal interventions and discover visual illusions--character sequences that hijack the counting mechanism. Our work demonstrates the rich sensory processing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReading and Literacy Development · Tactile and Sensory Interactions · Neurobiology of Language and Bilingualism
