DARE: Difficulty-Adaptive Reinforcement Learning with Co-Evolved Difficulty Estimation

Yang Zhou; Can Jin; Zihan Dong; Zhepeng Wang; Yanting Yang; Shiyu Zhao; Lei Li; Runxue Bao; Yaochen Xie; Dimitris N. Metaxas

arXiv:2605.09188·cs.LG·May 12, 2026

DARE: Difficulty-Adaptive Reinforcement Learning with Co-Evolved Difficulty Estimation

Yang Zhou, Can Jin, Zihan Dong, Zhepeng Wang, Yanting Yang, Shiyu Zhao, Lei Li, Runxue Bao, Yaochen Xie, Dimitris N. Metaxas

PDF

1 Repo

TL;DR

DARE is a reinforcement learning framework that adaptively estimates difficulty and co-evolves with the policy to improve training efficiency, response conciseness, and correctness across tasks.

Contribution

It introduces a unified, difficulty-aware RL method that co-evolves difficulty estimation with the policy using importance sampling and adaptive strategies.

Findings

01

Dare outperforms existing methods in training efficiency and final effectiveness.

02

It produces more concise responses on easy tasks and improves correctness on hard ones.

03

The approach is validated across multiple models and domains.

Abstract

Reinforcement learning improves the reasoning ability of large language models but remains costly and sample-inefficient, as many rollouts provide weak learning signals. Difficulty-aware data selection methods attempt to address this by prioritizing moderately difficult prompts, yet our analysis reveals three limitations: difficulty estimates become inaccurate under policy drift, data selection alone yields limited final-performance gains, and inference efficiency remains largely unchanged. These findings suggest that efficient and effective RL requires more than filtering by difficulty: the policy should learn to solve hard tasks while producing concise responses for easy ones. To this end, we propose **Dare**, a unified framework that co-evolves difficulty estimation with the policy via self-normalized importance sampling, maintains diverse difficulty coverage through a symmetric Beta…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EtaYang10th/DARE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.