From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models

Shubhra Mishra; Gabriel Poesia; Noah D. Goodman

arXiv:2407.00900·cs.AI·December 15, 2025

From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models

Shubhra Mishra, Gabriel Poesia, Noah D. Goodman

PDF

Open Access 1 Repo

TL;DR

This paper analyzes how mathematical reasoning skills develop in large language models during training, revealing that skills emerge in an order similar to human curricula and examining the effects of instruction tuning.

Contribution

It provides the first detailed analysis of the training dynamics of mathematical reasoning in open-weight LLMs using a novel synthetic dataset.

Findings

01

Mathematical skills develop during pre-training in an order correlating with the human curriculum.

02

Instruction tuning enhances some mathematical abilities but can impair others.

03

Training data order influences the emergence of reasoning skills.

Abstract

Large Language Models (LLMs) solely trained on next-token prediction learn to solve a wide range of problems involving mathematical reasoning. But how does this ability evolve during training? We show the first analysis of how mathematical reasoning abilities of several open-weight LLMs develop during pre-training and post-training. To this end, we construct MathCAMPS, a synthetic dataset of novel mathematical reasoning problems grounded in 44 fine-grained skills taken from the Common Core curriculum from K to 8th grades. In one experiment, we show that mathematical skills are learned during pre-training in an order that measurably correlates with the human-designed curriculum, even though training data are randomly ordered. We also show a detailed analysis of which mathematical abilities benefit from instruction tuning, a widely used post-training method and, in contrast, which skills…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gpoesia/mathcamps
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning

MethodsSparse Evolutionary Training · Pythia