TL;DR
This paper provides a theoretical framework demonstrating that curriculum strategies during post-training significantly reduce sample complexity for reasoning tasks in large language models, supported by empirical simulations.
Contribution
It introduces a formal analysis showing exponential improvements in sample complexity with curriculum methods and provides a new reasoning tree model for understanding these effects.
Findings
Curriculum strategies achieve polynomial sample complexity, unlike non-curriculum methods with exponential complexity.
Reinforcement learning finetuning with curricula improves reasoning accuracy.
Empirical simulations support the theoretical guarantees.
Abstract
Recent curriculum techniques in the post-training stage of LLMs have been empirically observed to outperform non-curriculum approaches in improving reasoning performance, yet a principled understanding of their effectiveness and limitations remains incomplete. To bridge this gap, we develop an abstract theoretical framework and identify sufficient conditions under which curriculum post-training yields exponential improvements in sample complexity. To substantiate this framework, we model the base model's Chain-of-Thought generation as a state-conditioned autoregressive reasoning tree, and formalize curriculum subtasks as either depth-increasing curricula that progressively extend reasoning horizons or hint-decreasing curricula that gradually remove partial hints. Our analysis shows that reinforcement learning finetuning with both curriculum strategies achieves high accuracy with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
