Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning
Hongye Cao, Zhixin Bai, Ziyue Peng, Boyan Wang, Tianpei Yang, Jing Huo, Yuyao Zhang, and Yang Gao

TL;DR
This paper introduces an entropy-aware reinforcement learning framework for large language models that improves reasoning by balancing semantic and token entropy, guided curriculum learning, and targeted regularization.
Contribution
It proposes a novel RL approach using semantic and token entropy signals, semantic entropy-guided curriculum, and non-uniform token regularization to prevent entropy collapse and enhance reasoning.
Findings
Outperforms existing entropy-based methods on 6 benchmarks
Effective mitigation of entropy collapse in LLMs
Improved reasoning capabilities across multiple models
Abstract
Reinforcement learning with verifiable rewards (RLVR) has demonstrated superior performance in enhancing the reasoning capability of large language models (LLMs). However, this accuracy-oriented learning paradigm often suffers from entropy collapse, which reduces policy exploration and limits reasoning capabilities. To address this challenge, we propose an efficient reinforcement learning framework that leverages entropy signals at both the semantic and token levels to improve reasoning. From the data perspective, we introduce semantic entropy-guided curriculum learning, organizing training data from low to high semantic entropy to guide progressive optimization from easier to more challenging tasks. For the algorithmic design, we adopt non-uniform token treatment by imposing KL regularization on low-entropy tokens that critically impact policy exploration and applying stronger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
