From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models
Shubhra Mishra, Gabriel Poesia, Noah D. Goodman

TL;DR
This paper analyzes how mathematical reasoning skills develop in large language models during training, revealing that skills emerge in an order similar to human curricula and examining the effects of instruction tuning.
Contribution
It provides the first detailed analysis of the training dynamics of mathematical reasoning in open-weight LLMs using a novel synthetic dataset.
Findings
Mathematical skills develop during pre-training in an order correlating with the human curriculum.
Instruction tuning enhances some mathematical abilities but can impair others.
Training data order influences the emergence of reasoning skills.
Abstract
Large Language Models (LLMs) solely trained on next-token prediction learn to solve a wide range of problems involving mathematical reasoning. But how does this ability evolve during training? We show the first analysis of how mathematical reasoning abilities of several open-weight LLMs develop during pre-training and post-training. To this end, we construct MathCAMPS, a synthetic dataset of novel mathematical reasoning problems grounded in 44 fine-grained skills taken from the Common Core curriculum from K to 8th grades. In one experiment, we show that mathematical skills are learned during pre-training in an order that measurably correlates with the human-designed curriculum, even though training data are randomly ordered. We also show a detailed analysis of which mathematical abilities benefit from instruction tuning, a widely used post-training method and, in contrast, which skills…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
MethodsSparse Evolutionary Training · Pythia
