What Makes a Good Curriculum? Disentangling the Effects of Data Ordering on LLM Mathematical Reasoning

Yaning Jia; Chunhui Zhang; Xingjian Diao; Xiangchi Yuan; Zhongyu Ouyang; Chiyu Ma; Soroush Vosoughi

arXiv:2510.19099·cs.LG·October 28, 2025

What Makes a Good Curriculum? Disentangling the Effects of Data Ordering on LLM Mathematical Reasoning

Yaning Jia, Chunhui Zhang, Xingjian Diao, Xiangchi Yuan, Zhongyu Ouyang, Chiyu Ma, Soroush Vosoughi

PDF

TL;DR

This paper investigates how different data ordering strategies in curriculum learning affect large language models' mathematical reasoning, revealing that effectiveness depends on model and task specifics, and that no single approach is universally best.

Contribution

It introduces a unified evaluation framework for curriculum difficulty and systematically analyzes the effects of data ordering on LLM reasoning performance across multiple models and metrics.

Findings

01

No universal curriculum strategy is optimal for all scenarios.

02

Effectiveness of forward vs. reverse curriculum depends on model and task complexity.

03

Decision-uncertain samples can enhance learning outcomes.

Abstract

Curriculum learning (CL) - ordering training data from easy to hard - has become a popular strategy for improving reasoning in large language models (LLMs). Yet prior work employs disparate difficulty metrics and training setups, leaving open fundamental questions: When does curriculum help? Which direction - forward or reverse - is better? And does the answer depend on what we measure? We address these questions through a unified offline evaluation framework that decomposes curriculum difficulty into five complementary dimensions: Problem Difficulty, Model Surprisal, Confidence Margin, Predictive Uncertainty, and Decision Variability. Through controlled post-training experiments on mathematical reasoning benchmarks with Llama3.1-8B, Mistral-7B, and Gemma3-4B, we find that (i) no curriculum strategy dominates universally - the relative effectiveness of forward versus reverse CL depends…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.