Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective

Deyang Kong; Qi Guo; Xiangyu Xi; Wei Wang; Jingang Wang; Xunliang Cai; Shikun Zhang; Wei Ye

arXiv:2505.17652·cs.LG·February 2, 2026

Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective

Deyang Kong, Qi Guo, Xiangyu Xi, Wei Wang, Jingang Wang, Xunliang Cai, Shikun Zhang, Wei Ye

PDF

Open Access 1 Video

TL;DR

This paper proposes CDAS, a novel sampling method for reinforcement learning in large language models that improves reasoning accuracy and efficiency by aligning problem difficulty with model competence using historical performance data.

Contribution

It introduces a competence-difficulty alignment sampling method that provides stable difficulty estimation and adaptive problem selection based on model competence.

Findings

01

CDAS achieves higher accuracy than baseline methods.

02

CDAS significantly improves training efficiency.

03

CDAS is 2.33 times faster than the Dynamic Sampling strategy.

Abstract

Reinforcement learning exhibits potential in enhancing the reasoning abilities of large language models, yet it is hard to scale for the low sample efficiency during the rollout phase. Existing methods attempt to improve efficiency by scheduling problems based on problem difficulties. However, these approaches suffer from unstable and biased estimations of problem difficulty and fail to capture the alignment between model competence and problem difficulty in RL training, leading to suboptimal results. To tackle these limitations, this paper introduces $C$ ompetence- $D$ ifficulty $A$ lignment $S$ ampling ( $CDAS$ ), which enables accurate and stable estimation of problem difficulties by aggregating historical performance discrepancies of problems. Then the model competence is quantified to adaptively select problems whose difficulty is in alignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Dialogue-Adaptive Pre-training Objective