Rethinking Easy-to-Hard: Limits of Curriculum Learning in Post-Training for Deductive Reasoning
Maximilian Mordig, Andreas Opedal, Weiyang Liu, Bernhard Sch\"olkopf

TL;DR
This paper empirically investigates the effectiveness of curriculum learning in post-training large language models for deductive reasoning, finding negligible benefits of difficulty-based sequencing over random sampling.
Contribution
It provides a systematic empirical analysis showing that curriculum learning offers limited advantages in deductive reasoning tasks during post-training of LLMs.
Findings
No robust advantage of curriculum over random sampling in accuracy.
Curriculum learning does not significantly improve response length.
Findings are consistent across supervised fine-tuning and reinforcement learning.
Abstract
Curriculum learning (CL), motivated by the intuition that learning in increasing order of difficulty should ease generalization, is commonly adopted both in pre-training and post-training of large language models (LLMs). The intuition of CL is particularly compelling for compositional reasoning, where complex problems are built from elementary inference rules; however, the actual impact of CL on such tasks remains largely underexplored. We present a systematic empirical study of CL for post-training of LLMs, using synthetic arithmetic and logical benchmarks where difficulty is characterized by reasoning complexity rather than surface-level proxies. Surprisingly, across multiple model families and curriculum schedules, we find no robust advantage in difficulty-based sequencing over standard random sampling in either accuracy or response length. These findings persist across both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
