D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning

Ru Zhang; Renda Li; Ziyu Ma; Weijie Qiu; Chongyang Tao; Yong Wang; Xiangxiang Chu

arXiv:2605.17037·cs.LG·May 19, 2026

D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning

Ru Zhang, Renda Li, Ziyu Ma, Weijie Qiu, Chongyang Tao, Yong Wang, Xiangxiang Chu

PDF

TL;DR

D$^2$Evo is a novel reinforcement learning framework that dynamically co-evolves question difficulty and reasoning ability, improving data efficiency and reasoning performance in language models.

Contribution

It introduces a dual difficulty-aware self-evolution approach that addresses data scarcity and difficulty mismatch in RL training for reasoning tasks.

Findings

01

Outperforms existing methods on mathematical reasoning benchmarks with fewer than 2K samples.

02

Shows strong generalization on various reasoning benchmarks.

03

Enables progressive reasoning gains through joint optimization of components.

Abstract

Reinforcement learning (RL) has demonstrated potential for enhancing reasoning in large language models (LLMs). However, effective RL training, which requires medium-difficulty training samples, faces two fundamental challenges: Effective Data Scarcity and Dynamic Difficulty Shifts, where medium-difficulty samples are scarce and become trivial as models improve. Existing methods mitigate this scarcity to some extent by generating training samples. However, these approaches suffer from anchor-free generation, ignoring co-evolution, and difficulty mismatch. To address these issues, we propose D $^{2}$ Evo, a Dual Difficulty-aware self-Evolution RL framework. In each iteration, our method mines medium-difficulty anchors based on the current Solver's capability, trains the Questioner to generate diverse questions at appropriate difficulty levels, and jointly optimizes both components to enable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.