Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration

Zhicheng Yang; Zhijiang Guo; Yinya Huang; Yongxin Wang; Dongchun Xie; Hanhui Li; Yiwei Wang; Xiaodan Liang; Jing Tang

arXiv:2508.13755·cs.LG·April 14, 2026

Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration

Zhicheng Yang, Zhijiang Guo, Yinya Huang, Yongxin Wang, Dongchun Xie, Hanhui Li, Yiwei Wang, Xiaodan Liang, Jing Tang

PDF

1 Repo

TL;DR

This paper introduces DARS, a new adaptive sampling method that enhances RLVR's reasoning by balancing difficulty and breadth, leading to improved performance in large language model training.

Contribution

It proposes DARS and DARS-Breadth, novel techniques that improve exploration and scaling in RLVR, demonstrating their effectiveness in boosting reasoning capabilities.

Findings

01

DARS re-weights difficult problems to improve reasoning.

02

Scaling batch size increases breadth and boosts Pass@1.

03

Combining DARS with large breadth yields the best performance.

Abstract

Reinforcement Learning with Verifiable Reward (RLVR) is a powerful method for enhancing the reasoning abilities of Large Language Models, but its full potential is limited by a lack of exploration in two key areas: Depth (the difficulty of problems) and Breadth (the number of training instances). Our analysis of the popular GRPO algorithm reveals a bias that down-weights difficult, low-accuracy problems, which are crucial for improving reasoning skills. To address this, we introduce Difficulty Adaptive Rollout Sampling (DARS), a method that re-weights difficult problems by using targeted, multi-stage rollouts. DARS increases the number of rollout outcomes for these harder problems according to our proposed re-balancing schedules and leads to consistent gains in Pass@K. We discovered that increasing rollout size alone does not improve performance and may actually impair it. In contrast,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yangzhch6/DARS
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.