Finding Interpretable Prompt-Specific Circuits in Language Models

Gabriel Franco; Lucas M. Tassis; Azalea Rohr; Mark Crovella

arXiv:2602.13483·cs.LG·May 15, 2026

Finding Interpretable Prompt-Specific Circuits in Language Models

Gabriel Franco, Lucas M. Tassis, Azalea Rohr, Mark Crovella

PDF

TL;DR

This paper introduces ACC++, an advanced method for identifying interpretable, prompt-specific circuits in language models, revealing insights into model behavior and language-specific signal communication.

Contribution

ACC++ improves circuit tracing by extracting causal signals from a single pass, enabling interpretability and cross-lingual analysis of language model attention mechanisms.

Findings

01

Many ACC++ signals are interpretable with natural language descriptions.

02

Prompt-specific circuits form well-defined clusters with distinct mechanisms.

03

Cross-language circuits reflect linguistic relatedness.

Abstract

Understanding the internal circuits that language models use to solve tasks remains a central challenge in mechanistic interpretability. A crucial part of finding circuits is understanding why each attention head attends where it does. To this end, we introduce ACC++, an improved circuit-tracing method based on the principle of attention-causal communication (ACC) [1], which identifies signals, i.e., contents of low dimensional subspaces that cause attention on a token pair. ACC++ extracts circuits from a single forward pass, without replacement models or patching. Circuits identified by ACC++ consist of components that are causal for the model's attention decisions, together with the low-dimensional signals used to communicate between them. Here, we first detail the conceptual advances that ACC++ makes over previous work. We then show that across multiple models, a substantial portion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.