Discovering Variable Binding Circuitry with Desiderata

Xander Davies; Max Nadeau; Nikhil Prakash; Tamar Rott Shaham; David; Bau

arXiv:2307.03637·cs.AI·July 10, 2023

Discovering Variable Binding Circuitry with Desiderata

Xander Davies, Max Nadeau, Nikhil Prakash, Tamar Rott Shaham, David, Bau

PDF

Open Access

TL;DR

This paper introduces a method to automatically identify specific model components responsible for subtasks in language models by specifying desired causal attributes, demonstrated by discovering variable binding circuitry in LLaMA-13B.

Contribution

The paper presents a novel approach extending causal mediation experiments to automatically find model components responsible for subtasks using desiderata, applied to variable binding in LLaMA-13B.

Findings

01

Localized variable binding to 9 attention heads and 1 MLP in LLaMA-13B

02

Successfully identified components responsible for arithmetic variable retrieval

03

Method generalizes causal mediation for automatic circuit discovery

Abstract

Recent work has shown that computation in language models may be human-understandable, with successful efforts to localize and intervene on both single-unit features and input-output circuits. Here, we introduce an approach which extends causal mediation experiments to automatically identify model components responsible for performing a specific subtask by solely specifying a set of \textit{desiderata}, or causal attributes of the model components executing that subtask. As a proof of concept, we apply our method to automatically discover shared \textit{variable binding circuitry} in LLaMA-13B, which retrieves variable values for multiple arithmetic tasks. Our method successfully localizes variable binding to only 9 attention heads (of the 1.6k) and one MLP in the final token's residual stream.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)