Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum
Nived Rajaraman, Audrey Huang, Miro Dudik, Robert Schapire, Dylan J. Foster, Akshay Krishnamurthy

TL;DR
This paper demonstrates that autocurriculum, an adaptive training method where models select training problems based on performance, significantly reduces data and compute costs in training reasoning models, with provable benefits over standard methods.
Contribution
It introduces autocurriculum as a provably effective approach that improves training efficiency for reasoning models in both supervised and reinforcement learning settings.
Findings
Autocurriculum requires exponentially fewer demonstrations in supervised fine-tuning.
In reinforcement learning, autocurriculum decouples training cost from reference model quality.
Adaptive data selection based on model performance enhances training efficiency.
Abstract
Chain-of-thought reasoning, where language models expend additional computation by producing thinking tokens prior to final responses, has driven significant advances in model capabilities. However, training these reasoning models is extremely costly in terms of both data and compute, as it involves collecting long traces of reasoning behavior from humans or synthetic generators and further post-training the model via reinforcement learning. Are these costs fundamental, or can they be reduced through better algorithmic design? We show that autocurriculum, where the model uses its own performance to decide which problems to focus training on, provably improves upon standard training recipes for both supervised fine-tuning (SFT) and reinforcement learning (RL). For SFT, we show that autocurriculum requires exponentially fewer reasoning demonstrations than non-adaptive fine-tuning, by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
