Targeted Exploration via Unified Entropy Control for Reinforcement Learning

Chen Wang; Lai Wei; Yanzhi Zhang; Chenyang Shao; Zedong Dan; Weiran Huang; Ge Lan; Yue Wang

arXiv:2604.14646·cs.AI·April 20, 2026

Targeted Exploration via Unified Entropy Control for Reinforcement Learning

Chen Wang, Lai Wei, Yanzhi Zhang, Chenyang Shao, Zedong Dan, Weiran Huang, Ge Lan, Yue Wang

PDF

1 Repo

TL;DR

The paper introduces UEC-RL, a unified framework for targeted exploration and stabilization in reinforcement learning, significantly improving reasoning capabilities in large models by maintaining exploration diversity and training stability.

Contribution

UEC-RL provides a novel targeted exploration and stabilization mechanism that enhances RL performance in large language and vision-language models.

Findings

01

UEC-RL achieves a 37.9% relative improvement over GRPO on Geometry3K.

02

Experiments show UEC-RL improves Pass@1 and Pass@$k$ metrics.

03

UEC-RL maintains stable training while expanding exploration.

Abstract

Recent advances in reinforcement learning (RL) have improved the reasoning capabilities of large language models (LLMs) and vision-language models (VLMs). However, the widely used Group Relative Policy Optimization (GRPO) consistently suffers from entropy collapse, causing the policy to converge prematurely and lose diversity. Existing exploration methods introduce additional bias or variance during exploration, making it difficult to maintain optimization stability. We propose Unified Entropy Control for Reinforcement Learning (UEC-RL), a framework that provides targeted mechanisms for exploration and stabilization. UEC-RL activates more exploration on difficult prompts to search for potential and valuable reasoning trajectories. In parallel, a stabilizer prevents entropy from growing uncontrollably, thereby keeping training stable as the model consolidates reliable behaviors.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

597358816/UEC-RL
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.