Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning

Jia Deng; Jie Chen; Zhipeng Chen; Wayne Xin Zhao; Ji-Rong Wen

arXiv:2508.02260·cs.CL·August 5, 2025

Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning

Jia Deng, Jie Chen, Zhipeng Chen, Wayne Xin Zhao, Ji-Rong Wen

PDF

Open Access

TL;DR

This paper systematically analyzes how the entropy-performance trade-off in reinforcement learning with verifiable rewards affects large language model training, revealing stage-specific dynamics and proposing adaptive reward methods for improved learning.

Contribution

It provides a detailed empirical study of entropy-performance interactions at multiple granularities and introduces dynamic reward adjustment techniques based on these insights.

Findings

01

Entropy reduction in negative samples aids reasoning pattern learning.

02

High-entropy tokens at sequence ends correlate with learning efficiency.

03

Adaptive reward methods improve LLM training performance.

Abstract

Recently, reinforcement learning with verifiable rewards (RLVR) has been widely used for enhancing the reasoning abilities of large language models (LLMs). A core challenge in RLVR involves managing the exchange between entropy and performance of policies. Despite the importance of this exchange, a fine-grained understanding of when and how this exchange operates most effectively remains limited. To bridge this gap, we conduct a systematic empirical analysis of the entropy-performance exchange mechanism of RLVR across different levels of granularity. Specifically, we first divide the training process into two distinct stages based on entropy dynamics, i.e., rising stage and plateau stage, and then systematically investigate how this mechanism varies across stage-level, instance-level, and token-level granularitiess. Our analysis reveals that, in the rising stage, entropy reduction in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Explainable Artificial Intelligence (XAI)