Which Attention Heads Matter for In-Context Learning?
Kayo Yin, Jacob Steinhardt

TL;DR
This paper investigates the mechanisms behind in-context learning in large language models, revealing that function vector heads are primarily responsible for ICL performance and are connected to induction heads, which facilitate their learning.
Contribution
It identifies FV heads as the main driver of ICL in large models and uncovers their developmental connection to induction heads during training.
Findings
FV heads are crucial for ICL, especially in larger models.
Many FV heads originate as induction heads before transitioning to FV mechanism.
Induction heads may facilitate learning the complex FV mechanism.
Abstract
Large language models (LLMs) exhibit impressive in-context learning (ICL) capability, enabling them to perform new tasks using only a few demonstrations in the prompt. Two different mechanisms have been proposed to explain ICL: induction heads that find and copy relevant tokens, and function vector (FV) heads whose activations compute a latent encoding of the ICL task. To better understand which of the two distinct mechanisms drives ICL, we study and compare induction heads and FV heads in 12 language models. Through detailed ablations, we discover that few-shot ICL performance depends primarily on FV heads, especially in larger models. In addition, we uncover that FV and induction heads are connected: many FV heads start as induction heads during training before transitioning to the FV mechanism. This leads us to speculate that induction facilitates learning the more complex FV…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeuroscience, Education and Cognitive Function
