A Theory of Emergent In-Context Learning as Implicit Structure Induction

Michael Hahn; Navin Goyal

arXiv:2303.07971·cs.CL·March 15, 2023·5 cites

A Theory of Emergent In-Context Learning as Implicit Structure Induction

Michael Hahn, Navin Goyal

PDF

Open Access

TL;DR

This paper provides a theoretical framework explaining how large language models develop in-context learning capabilities through the recombination of linguistic structures, supported by experiments with controlled setups and probing analyses.

Contribution

It introduces an information-theoretic model linking in-context learning to the presence of compositional structure in training data, and validates it with controlled experiments.

Findings

01

In-context learning emerges with increased model size and data.

02

Prompting models to output intermediate steps improves performance.

03

Models' internal representations encode compositional structures.

Abstract

Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on recombination of compositional operations found in natural language data. We derive an information-theoretic bound showing how in-context learning abilities arise from generic next-token prediction when the pretraining distribution has sufficient amounts of compositional structure, under linguistically motivated assumptions. A second bound provides a theoretical justification for the empirical success of prompting LLMs to output intermediate steps towards an answer. To validate theoretical predictions, we introduce a controlled setup for inducing in-context learning; unlike previous approaches, it accounts for the compositional nature of language.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Language and cultural evolution