Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

Hongye Cao; Zhixin Bai; Ziyue Peng; Boyan Wang; Tianpei Yang; Jing Huo; Yuyao Zhang; and Yang Gao

arXiv:2512.04359·cs.AI·January 19, 2026

Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

Hongye Cao, Zhixin Bai, Ziyue Peng, Boyan Wang, Tianpei Yang, Jing Huo, Yuyao Zhang, and Yang Gao

PDF

Open Access

TL;DR

This paper introduces an entropy-aware reinforcement learning framework for large language models that improves reasoning by balancing semantic and token entropy, guided curriculum learning, and targeted regularization.

Contribution

It proposes a novel RL approach using semantic and token entropy signals, semantic entropy-guided curriculum, and non-uniform token regularization to prevent entropy collapse and enhance reasoning.

Findings

01

Outperforms existing entropy-based methods on 6 benchmarks

02

Effective mitigation of entropy collapse in LLMs

03

Improved reasoning capabilities across multiple models

Abstract

Reinforcement learning with verifiable rewards (RLVR) has demonstrated superior performance in enhancing the reasoning capability of large language models (LLMs). However, this accuracy-oriented learning paradigm often suffers from entropy collapse, which reduces policy exploration and limits reasoning capabilities. To address this challenge, we propose an efficient reinforcement learning framework that leverages entropy signals at both the semantic and token levels to improve reasoning. From the data perspective, we introduce semantic entropy-guided curriculum learning, organizing training data from low to high semantic entropy to guide progressive optimization from easier to more challenging tasks. For the algorithmic design, we adopt non-uniform token treatment by imposing KL regularization on low-entropy tokens that critically impact policy exploration and applying stronger…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques