How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors

Yifan Xu; Junren Chen; Yifan Chen

arXiv:2605.08817·cs.AI·May 12, 2026

How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors

Yifan Xu, Junren Chen, Yifan Chen

PDF

TL;DR

This paper introduces IMAX, a framework that enhances exploration in RLVR by training soft prefixes to diversify reasoning trajectories, leading to significant performance improvements across multiple scales.

Contribution

IMAX provides a novel, model-agnostic approach to improve exploration in RLVR by using trainable prefixes and an information maximization reward, outperforming standard methods.

Findings

01

IMAX achieves up to 11.60% improvement in Pass@4.

02

IMAX consistently outperforms standard RLVR across three backbone scales.

03

The framework is compatible with existing RLVR pipelines.

Abstract

Reinforcement learning with verifiable rewards (RLVR) recently thrives in large language model (LLM) reasoning tasks. However, the reward sparsity and the long reasoning horizon make effective exploration challenging. In practice, this challenge manifests as the \emph{entropy collapse} phenomenon, where RLVR improves single-rollout accuracy but fails to expand coverage on successful reasoning trajectories. Passive exploration techniques like entropy regularization tend to dismiss generation quality, resulting in noisy rollouts. In response to this issue, we propose an Information-Maximizing Augmented eXploration (IMAX) framework to train a pool of soft prefixes that reshapes the base model's prior over reasoning trajectories. Rather than relying on RL to incentivize exploration on top of the base model, each prefix acts as a trainable control knob that induces a distinct rollout…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.