Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer
Ishita Darade, Sushrut Thorat

TL;DR
This paper investigates how a Transformer model performs base-digit extraction, showing it represents intermediate calculations but does not causally use them in the way simple probes suggest, highlighting the difference between representation and computation.
Contribution
It demonstrates that Transformers can encode explicit algorithmic intermediates but do not causally utilize these intermediates in the straightforward manner suggested by probes.
Findings
Model achieves 99.83% accuracy on base-digit extraction task.
Probes decode intermediate representations effectively.
Causal tests reveal the model does not use intermediates as expected.
Abstract
Structured prompts require integrating components according to task-relevant relations. How a network implements this integration is often hard to judge in language or vision, where those relations are rarely specified precisely enough to define a candidate internal algorithm. Arithmetic offers a cleaner setting. We study a Transformer trained on base-digit extraction: given , , and , it must report the coefficient of in the base- expansion of . The closed-form solution, , provides explicit candidate algorithmic intermediates. Across three seeds, the model reaches 99.83% exact-answer accuracy on held-out number-base intersections, establishing reliable task competence. Linear probes decode the intermediates, making staged arithmetic computation plausible. Causal tests then separate representation from use: within the localized route…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
