Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning
Haolin Yang, Hakaze Cho, Yiqiao Zhong, Naoya Inoue

TL;DR
This paper introduces a geometric framework for understanding in-context learning in large language models, linking attention heads and task vectors through hidden state dynamics to explain how models perform classification tasks.
Contribution
It unifies the roles of attention heads and task vectors in ICL by analyzing hidden state geometry, revealing a two-stage process of separability and alignment across layers.
Findings
Separability emerges in early layers driven by previous token heads.
Alignment develops in later layers facilitated by induction heads and task vectors.
The framework bridges attention heads and task vectors, explaining ICL mechanisms.
Abstract
The unusual properties of in-context learning (ICL) have prompted investigations into the internal mechanisms of large language models. Prior work typically focuses on either special attention heads or task vectors at specific layers, but lacks a unified framework linking these components to the evolution of hidden states across layers that ultimately produce the model's output. In this paper, we propose such a framework for ICL in classification tasks by analyzing two geometric factors that govern performance: the separability and alignment of query hidden states. A fine-grained analysis of layer-wise dynamics reveals a striking two-stage mechanism: separability emerges in early layers, while alignment develops in later layers. Ablation studies further show that Previous Token Heads drive separability, while Induction Heads and task vectors enhance alignment. Our findings thus bridge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsSoftmax · Attention Is All You Need
