What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Aaditya K. Singh, Ted Moskovitz, Felix Hill, Stephanie C.Y. Chan,, Andrew M. Saxe

TL;DR
This study investigates the emergence and formation of induction heads in transformer models, revealing the subcircuit interactions and conditions necessary for their development through controlled experiments on synthetic data.
Contribution
It introduces a causal framework to analyze induction head formation, identifying key subcircuits and their interactions that enable in-context learning circuits.
Findings
Identified three subcircuits driving induction head formation
Demonstrated the additive nature of induction head components
Linked subcircuit interactions to phase change timing
Abstract
In-context learning is a powerful emergent ability in transformer models. Prior work in mechanistic interpretability has identified a circuit element that may be critical for in-context learning -- the induction head (IH), which performs a match-and-copy operation. During training of large transformers on natural language data, IHs emerge around the same time as a notable phase change in the loss. Despite the robust evidence for IHs and this interesting coincidence with the phase change, relatively little is known about the diversity and emergence dynamics of IHs. Why is there more than one IH, and how are they dependent on each other? Why do IHs appear all of a sudden, and what are the subcircuits that enable them to emerge? We answer these questions by studying IH emergence dynamics in a controlled setting by training on synthetic data. In doing so, we develop and share a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnalog and Mixed-Signal Circuit Design
