The mechanistic basis of data dependence and abrupt learning in an in-context classification task
Gautam Reddy

TL;DR
This paper investigates the mechanisms behind in-context learning in transformer models, revealing that abrupt emergence of induction heads and specific training dynamics enable this ability, with implications for understanding neural network learning processes.
Contribution
It introduces a minimal attention-only model that captures the data dependencies of in-context learning and explains the abrupt emergence of induction heads through a sequential, nested learning process.
Findings
In-context learning is driven by the sudden appearance of induction heads.
A two-parameter model can emulate full data distribution dependencies.
Sequential learning of nested logits explains abrupt transitions in attention networks.
Abstract
Transformer models exhibit in-context learning: the ability to accurately predict the response to a novel query based on illustrative examples in the input sequence. In-context learning contrasts with traditional in-weights learning of query-output relationships. What aspects of the training data distribution and architecture favor in-context vs in-weights learning? Recent work has shown that specific distributional properties inherent in language, such as burstiness, large dictionaries and skewed rank-frequency distributions, control the trade-off or simultaneous appearance of these two forms of learning. We first show that these results are recapitulated in a minimal attention-only network trained on a simplified dataset. In-context learning (ICL) is driven by the abrupt emergence of an induction head, which subsequently competes with in-weights learning. By identifying progress…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Neural dynamics and brain function · Domain Adaptation and Few-Shot Learning
