Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling

Yiran Guo; Zhongjian Qiao; Yingqi Xie; Jie Liu; Dan Ye; Ruiqing Zhang; Shuang Qiu; Lijie Xu

arXiv:2602.14169·cs.LG·February 17, 2026

Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling

Yiran Guo, Zhongjian Qiao, Yingqi Xie, Jie Liu, Dan Ye, Ruiqing Zhang, Shuang Qiu, Lijie Xu

PDF

Open Access

TL;DR

This paper introduces Deep Dense Exploration (DDE), a novel reinforcement learning strategy for large language models that enhances exploration efficiency by focusing on recoverable, deep states, leading to improved performance on reasoning benchmarks.

Contribution

The paper proposes DDE, a new exploration method that uses pivot-driven resampling and a utility function to better explore deep, recoverable states in RL for language models, outperforming existing methods.

Findings

01

DDE outperforms GRPO and tree-based methods on reasoning benchmarks.

02

The utility function effectively balances recoverability and depth bias.

03

Local dense resampling increases the discovery of correct trajectories.

Abstract

Effective exploration is a key challenge in reinforcement learning for large language models: discovering high-quality trajectories within a limited sampling budget from the vast natural language sequence space. Existing methods face notable limitations: GRPO samples exclusively from the root, saturating high-probability trajectories while leaving deep, error-prone states under-explored. Tree-based methods blindly disperse budgets across trivial or unrecoverable states, causing sampling dilution that fails to uncover rare correct suffixes and destabilizes local baselines. To address this, we propose Deep Dense Exploration (DDE), a strategy that focuses exploration on $pivots$ -deep, recoverable states within unsuccessful trajectories. We instantiate DDE with DEEP-GRPO, which introduces three key innovations: (1) a lightweight data-driven utility function that automatically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications