Reinforced Efficient Reasoning via Semantically Diverse Exploration

Ziqi Zhao; Zhaochun Ren; Jiahong Zou; Liu Yang; Zhiwei Xu; Xuri Ge; Zhumin Chen; Xinyu Ma; Daiting Shi; Shuaiqiang Wang; Dawei Yin; Xin Xin

arXiv:2601.05053·cs.AI·April 21, 2026

Reinforced Efficient Reasoning via Semantically Diverse Exploration

Ziqi Zhao, Zhaochun Ren, Jiahong Zou, Liu Yang, Zhiwei Xu, Xuri Ge, Zhumin Chen, Xinyu Ma, Daiting Shi, Shuaiqiang Wang, Dawei Yin, Xin Xin

PDF

1 Repo

TL;DR

ROSE introduces semantically diverse exploration strategies to improve reasoning diversity and efficiency in reinforcement learning for large language models, validated on mathematical reasoning benchmarks.

Contribution

It proposes a novel semantic-entropy-based branching and epsilon-greedy exploration to enhance reasoning diversity and efficiency in RL-based LLM reasoning.

Findings

01

ROSE improves reasoning diversity and efficiency on mathematical benchmarks.

02

Semantic-entropy-based branching captures semantic uncertainty effectively.

03

Length-aware advantage estimator rewards concise, correct reasoning.

Abstract

Reinforcement learning with verifiable rewards (RLVR) has proven effective in enhancing the reasoning of large language models (LLMs). Monte Carlo Tree Search (MCTS)-based extensions improve upon vanilla RLVR (e.g., GRPO) by providing tree-based reasoning rollouts that enable fine-grained and segment-level credit assignment. However, existing methods still suffer from limited exploration diversity and inefficient reasoning. To address the above challenges, we propose reinforced efficient reasoning via semantically diverse explorations, i.e., ROSE, for LLMs. To encourage more diverse reasoning exploration, our method incorporates a semantic-entropy-based branching strategy and an $ε$ -exploration mechanism. The former operates on already sampled reasoning rollouts to capture semantic uncertainty and select branching points with high semantic divergence to generate new successive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZiqiZhao1/ROSE-rl
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.