A Theory of Emergent In-Context Learning as Implicit Structure Induction
Michael Hahn, Navin Goyal

TL;DR
This paper provides a theoretical framework explaining how large language models develop in-context learning capabilities through the recombination of linguistic structures, supported by experiments with controlled setups and probing analyses.
Contribution
It introduces an information-theoretic model linking in-context learning to the presence of compositional structure in training data, and validates it with controlled experiments.
Findings
In-context learning emerges with increased model size and data.
Prompting models to output intermediate steps improves performance.
Models' internal representations encode compositional structures.
Abstract
Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on recombination of compositional operations found in natural language data. We derive an information-theoretic bound showing how in-context learning abilities arise from generic next-token prediction when the pretraining distribution has sufficient amounts of compositional structure, under linguistically motivated assumptions. A second bound provides a theoretical justification for the empirical success of prompting LLMs to output intermediate steps towards an answer. To validate theoretical predictions, we introduce a controlled setup for inducing in-context learning; unlike previous approaches, it accounts for the compositional nature of language.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Language and cultural evolution
