The mechanistic basis of data dependence and abrupt learning in an   in-context classification task

Gautam Reddy

arXiv:2312.03002·cs.LG·December 7, 2023·2 cites

The mechanistic basis of data dependence and abrupt learning in an in-context classification task

Gautam Reddy

PDF

Open Access

TL;DR

This paper investigates the mechanisms behind in-context learning in transformer models, revealing that abrupt emergence of induction heads and specific training dynamics enable this ability, with implications for understanding neural network learning processes.

Contribution

It introduces a minimal attention-only model that captures the data dependencies of in-context learning and explains the abrupt emergence of induction heads through a sequential, nested learning process.

Findings

01

In-context learning is driven by the sudden appearance of induction heads.

02

A two-parameter model can emulate full data distribution dependencies.

03

Sequential learning of nested logits explains abrupt transitions in attention networks.

Abstract

Transformer models exhibit in-context learning: the ability to accurately predict the response to a novel query based on illustrative examples in the input sequence. In-context learning contrasts with traditional in-weights learning of query-output relationships. What aspects of the training data distribution and architecture favor in-context vs in-weights learning? Recent work has shown that specific distributional properties inherent in language, such as burstiness, large dictionaries and skewed rank-frequency distributions, control the trade-off or simultaneous appearance of these two forms of learning. We first show that these results are recapitulated in a minimal attention-only network trained on a simplified dataset. In-context learning (ICL) is driven by the abrupt emergence of an induction head, which subsequently competes with in-weights learning. By identifying progress…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Neural dynamics and brain function · Domain Adaptation and Few-Shot Learning