Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning

Futing Wang; Jianhao Yan; Yun Luo; Ganqu Cui; Zhi Wang; Xiaoye Qu; Yue Zhang; Yu Cheng; Tao Lin

arXiv:2602.11748·cs.CL·February 13, 2026

Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning

Futing Wang, Jianhao Yan, Yun Luo, Ganqu Cui, Zhi Wang, Xiaoye Qu, Yue Zhang, Yu Cheng, Tao Lin

PDF

Open Access

TL;DR

This paper introduces Length-Incentivized Exploration, a reinforcement learning approach that encourages models to generate longer reasoning trajectories, thereby enhancing their in-context exploration capabilities and improving performance on various tasks.

Contribution

The paper proposes a novel length-based reward and redundancy penalty to overcome the shallow exploration trap, significantly improving in-context exploration in language models.

Findings

01

Achieves 4.4% improvement on in-domain tasks

02

Achieves 2.7% improvement on out-of-domain benchmarks

03

Effectively incentivizes longer, more diverse reasoning trajectories

Abstract

Achieving effective test-time scaling requires models to engage in In-Context Exploration -- the intrinsic ability to generate, verify, and refine multiple reasoning hypotheses within a single continuous context. Grounded in State Coverage theory, our analysis identifies a critical bottleneck to enabling this capability: while broader state coverage requires longer reasoning trajectories, the probability of sampling such sequences decays exponentially during autoregressive generation, a phenomenon we term the ``Shallow Exploration Trap''. To bridge this gap, we propose Length-Incentivized Exploration(\method). This simple yet effective recipe explicitly encourages models to explore more via a length-based reward coupled with a redundancy penalty, thereby maximizing state coverage in two-step manner. Comprehensive experiments across different models (Qwen3, Llama) demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics