Next-token pretraining implies in-context learning

Paul M. Riechers; Henry R. Bigelow; Eric A. Alt; Adam Shai

arXiv:2505.18373·cs.LG·July 15, 2025

Next-token pretraining implies in-context learning

Paul M. Riechers, Henry R. Bigelow, Eric A. Alt, Adam Shai

PDF

Open Access

TL;DR

This paper demonstrates that in-context learning naturally results from standard next-token pretraining, with a theoretical framework predicting its dynamics and experimental validation showing phase transitions and loss scaling.

Contribution

It establishes a foundational information-theoretic explanation for in-context learning as an emergent property of standard pretraining, not an exotic phenomenon.

Findings

01

Predicts in-context learning dynamics using information theory

02

Reproduces phase transitions in training loss for induction heads

03

Shows power-law scaling of in-context loss

Abstract

We argue that in-context learning (ICL) predictably arises from standard self-supervised next-token pretraining, rather than being an exotic emergent property. This work establishes the foundational principles of this emergence by focusing on in-distribution ICL, demonstrating how models necessarily adapt to context when trained on token sequences, especially from non-ergodic sources. Our information-theoretic framework precisely predicts these in-distribution ICL dynamics (i.e., context-dependent loss reduction). We verify this with experiments using synthetic datasets of differing types of correlational structure, reproducing characteristic phenomena like phase transitions in training loss for induction head formation and power-law scaling of in-context loss. We further show that a model's in-context performance on any task is mathematically coupled to the ensemble of tasks seen in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetics and Neurodevelopmental Disorders