Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR

Chanuk Lee; Sangwoo Park; Minki Kang; Sung Ju Hwang

arXiv:2605.15726·cs.AI·May 18, 2026

Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR

Chanuk Lee, Sangwoo Park, Minki Kang, Sung Ju Hwang

PDF

1 Repo

TL;DR

NudgeRL introduces a strategy-guided exploration framework for RLVR that enhances reasoning capabilities of language models by promoting diverse trajectories without expensive supervision, outperforming standard methods.

Contribution

The paper proposes a novel structured exploration method, Strategy Nudging, and a unified learning objective to improve RLVR efficiency and diversity, surpassing existing baselines.

Findings

01

NudgeRL outperforms standard GRPO with up to 8x larger rollout budgets.

02

It surpasses oracle-guided RL baselines on five math benchmarks.

03

Structured exploration can replace brute-force scaling and privileged information methods.

Abstract

Reinforcement learning with verifiable rewards (RLVR) has emerged as a scalable paradigm for improving the reasoning capabilities of large language models. However, its effectiveness is fundamentally limited by exploration: the policy can only improve on trajectories it has already sampled. While increasing the number of rollouts alleviates this issue, such brute-force scaling is computationally expensive, and existing approaches that modify the optimization objective provide limited control over what is explored. In this work, we propose NudgeRL, a framework for structured and diversity-driven exploration in RLVR. Our approach introduces Strategy Nudging, which conditions each rollout on lightweight, strategy-level contexts to induce diverse reasoning trajectories without relying on expensive oracle supervision. To effectively learn from such structured exploration, we further propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tally0818/NudgeRL
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.