Investigating the Indirect Object Identification circuit in Mamba
Danielle Ensign, Adri\`a Garriga-Alonso

TL;DR
This paper explores the interpretability of the Mamba recurrent architecture by adapting existing techniques to reverse-engineer the circuit responsible for the Indirect Object Identification task, revealing key layers and mechanisms involved.
Contribution
It demonstrates that circuit-based interpretability tools can effectively analyze the Mamba architecture and identifies specific circuit components involved in IOI processing.
Findings
Layer 39 is a key bottleneck in the circuit
Convolutions in layer 39 shift names forward by one position
Name entities are stored linearly in Layer 39's SSM
Abstract
How well will current interpretability techniques generalize to future models? A relevant case study is Mamba, a recent recurrent architecture with scaling comparable to Transformers. We adapt pre-Mamba techniques to Mamba and partially reverse-engineer the circuit responsible for the Indirect Object Identification (IOI) task. Our techniques provide evidence that 1) Layer 39 is a key bottleneck, 2) Convolutions in layer 39 shift names one position forward, and 3) The name entities are stored linearly in Layer 39's SSM. Finally, we adapt an automatic circuit discovery tool, positional Edge Attribution Patching, to identify a Mamba IOI circuit. Our contributions provide initial evidence that circuit-based mechanistic interpretability tools work well for the Mamba architecture.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Advanced Neural Network Applications · Machine Learning and Algorithms
MethodsActivation Patching · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
