AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting
Renda Li, Hailang Huang, Fei Wei, Feng Xiong, Yong Wang, Xiangxiang Chu

TL;DR
AdaCuRL introduces an adaptive curriculum reinforcement learning framework that dynamically aligns data difficulty with model capability, mitigating issues like gradient starvation and policy degradation, leading to improved reasoning performance in large language models.
Contribution
The paper presents AdaCuRL, a novel adaptive curriculum RL method that incorporates difficulty estimation, data revisitation, and strategies to prevent policy degradation, addressing key challenges in training large language models.
Findings
Achieves significant performance improvements on reasoning benchmarks.
Effectively mitigates catastrophic forgetting and policy degradation.
Demonstrates robustness across diverse reasoning tasks.
Abstract
Reinforcement learning (RL) has demonstrated considerable potential for enhancing reasoning in large language models (LLMs). However, existing methods suffer from Gradient Starvation and Policy Degradation when training directly on samples with mixed difficulty. To mitigate this, prior approaches leverage Chain-of-Thought (CoT) data, but the construction of high-quality CoT annotations remains labor-intensive. Alternatively, curriculum learning strategies have been explored but frequently encounter challenges, such as difficulty mismatch, reliance on manual curriculum design, and catastrophic forgetting. To address these issues, we propose AdaCuRL, a Adaptive Curriculum Reinforcement Learning framework that integrates coarse-to-fine difficulty estimation with adaptive curriculum scheduling. This approach dynamically aligns data difficulty with model capability and incorporates a data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Intelligent Tutoring Systems and Adaptive Learning · Multimodal Machine Learning Applications
