What do Language Models Learn and When? The Implicit Curriculum Hypothesis

Emmy Liu; Kaiser Sun; Millicent Li; Isabelle Lee; Lindia Tjuatja; Jen-tse Huang; Graham Neubig

arXiv:2604.08510·cs.CL·April 10, 2026

What do Language Models Learn and When? The Implicit Curriculum Hypothesis

Emmy Liu, Kaiser Sun, Millicent Li, Isabelle Lee, Lindia Tjuatja, Jen-tse Huang, Graham Neubig

PDF

1 Repo 1 Models 1 Datasets

TL;DR

This paper investigates the emergence of skills in large language models during pretraining, revealing a predictable, compositional curriculum that is consistent across models and encoded in their internal representations.

Contribution

It introduces the Implicit Curriculum Hypothesis and demonstrates that skill emergence follows a compositional and predictable order across different model sizes and data mixtures.

Findings

01

Emergence orderings are highly consistent across models ($ ho = .81$).

02

Composite tasks emerge after their component tasks.

03

Model representations encode the structure of skill emergence, enabling trajectory prediction.

Abstract

Large language models (LLMs) can perform remarkably complex tasks, yet the fine-grained details of how these capabilities emerge during pretraining remain poorly understood. Scaling laws on validation loss tell us how much a model improves with additional compute, but not what skills it acquires in which order. To remedy this, we propose the Implicit Curriculum Hypothesis: pretraining follows a compositional and predictable curriculum across models and data mixtures. We test this by designing a suite of simple, composable tasks spanning retrieval, morphological transformations, coreference, logical reasoning, and mathematics. Using these tasks, we track emergence points across four model families spanning sizes from 410M-13B parameters. We find that emergence orderings of when models reach fixed accuracy thresholds are strikingly consistent ( $ρ = .81$ across 45 model pairs), and that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaiserwholearns/ElementalTask
github

Models

🤗
tekkmaven/representation-learning-dynamics
model

Datasets

elemental-tasks/model-trajectories
dataset· 7.5k dl
7.5k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.