Investigating the Indirect Object Identification circuit in Mamba

Danielle Ensign; Adri\`a Garriga-Alonso

arXiv:2407.14008·cs.LG·July 23, 2024

Investigating the Indirect Object Identification circuit in Mamba

Danielle Ensign, Adri\`a Garriga-Alonso

PDF

Open Access 1 Repo

TL;DR

This paper explores the interpretability of the Mamba recurrent architecture by adapting existing techniques to reverse-engineer the circuit responsible for the Indirect Object Identification task, revealing key layers and mechanisms involved.

Contribution

It demonstrates that circuit-based interpretability tools can effectively analyze the Mamba architecture and identifies specific circuit components involved in IOI processing.

Findings

01

Layer 39 is a key bottleneck in the circuit

02

Convolutions in layer 39 shift names forward by one position

03

Name entities are stored linearly in Layer 39's SSM

Abstract

How well will current interpretability techniques generalize to future models? A relevant case study is Mamba, a recent recurrent architecture with scaling comparable to Transformers. We adapt pre-Mamba techniques to Mamba and partially reverse-engineer the circuit responsible for the Indirect Object Identification (IOI) task. Our techniques provide evidence that 1) Layer 39 is a key bottleneck, 2) Convolutions in layer 39 shift names one position forward, and 3) The name entities are stored linearly in Layer 39's SSM. Finally, we adapt an automatic circuit discovery tool, positional Edge Attribution Patching, to identify a Mamba IOI circuit. Our contributions provide initial evidence that circuit-based mechanistic interpretability tools work well for the Mamba architecture.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Phylliida/investigating-mamba-ioi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Advanced Neural Network Applications · Machine Learning and Algorithms

MethodsActivation Patching · Mamba: Linear-Time Sequence Modeling with Selective State Spaces