Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling
Yiran Guo, Zhongjian Qiao, Yingqi Xie, Jie Liu, Dan Ye, Ruiqing Zhang, Shuang Qiu, Lijie Xu

TL;DR
This paper introduces Deep Dense Exploration (DDE), a novel reinforcement learning strategy for large language models that enhances exploration efficiency by focusing on recoverable, deep states, leading to improved performance on reasoning benchmarks.
Contribution
The paper proposes DDE, a new exploration method that uses pivot-driven resampling and a utility function to better explore deep, recoverable states in RL for language models, outperforming existing methods.
Findings
DDE outperforms GRPO and tree-based methods on reasoning benchmarks.
The utility function effectively balances recoverability and depth bias.
Local dense resampling increases the discovery of correct trajectories.
Abstract
Effective exploration is a key challenge in reinforcement learning for large language models: discovering high-quality trajectories within a limited sampling budget from the vast natural language sequence space. Existing methods face notable limitations: GRPO samples exclusively from the root, saturating high-probability trajectories while leaving deep, error-prone states under-explored. Tree-based methods blindly disperse budgets across trivial or unrecoverable states, causing sampling dilution that fails to uncover rare correct suffixes and destabilizes local baselines. To address this, we propose Deep Dense Exploration (DDE), a strategy that focuses exploration on -deep, recoverable states within unsuccessful trajectories. We instantiate DDE with DEEP-GRPO, which introduces three key innovations: (1) a lightweight data-driven utility function that automatically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
