Modular Arithmetic: Language Models Solve Math Digit by Digit
Tanja Baeumel, Daniil Gurgurov, Yusser al Ghussin, Josef van Genabith, Simon Ostermann

TL;DR
This paper uncovers how large language models internally represent and process numbers in a digit-wise manner, revealing digit-position-specific circuits that enable simple arithmetic, which are consistent across model sizes and tokenization methods.
Contribution
It provides evidence for digit-position-specific circuits in LLMs that operate independently on different digits, advancing understanding of their internal arithmetic mechanisms.
Findings
Existence of digit-position-specific circuits in LLMs
Circuits operate independently across digit positions
Interventions confirm causal role in arithmetic solving
Abstract
While recent work has begun to uncover the internal strategies that Large Language Models (LLMs) employ for simple arithmetic tasks, a unified understanding of their underlying mechanisms is still lacking. We extend recent findings showing that LLMs represent numbers in a digit-wise manner and present evidence for the existence of digit-position-specific circuits that LLMs use to perform simple arithmetic tasks, i.e. modular subgroups of MLP neurons that operate independently on different digit positions (units, tens, hundreds). Notably, such circuits exist independently of model size and of tokenization strategy, i.e. both for models that encode longer numbers digit-by-digit and as one token. Using Feature Importance and Causal Interventions, we identify and validate the digit-position-specific circuits, revealing a compositional and interpretable structure underlying the solving of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive and developmental aspects of mathematical skills · Mathematics, Computing, and Information Processing · Ferroelectric and Negative Capacitance Devices
