Beyond High-Entropy Exploration: Correctness-Aware Low-Entropy Segment-Based Advantage Shaping for Reasoning LLMs

Xinzhu Chen; Xuesheng Li; Zhongxiang Sun; Weijie Yu

arXiv:2512.00908·cs.LG·December 2, 2025

Beyond High-Entropy Exploration: Correctness-Aware Low-Entropy Segment-Based Advantage Shaping for Reasoning LLMs

Xinzhu Chen, Xuesheng Li, Zhongxiang Sun, Weijie Yu

PDF

Open Access

TL;DR

This paper introduces LESS, a reinforcement learning framework that leverages low-entropy segments in reasoning trajectories to improve large language model accuracy and robustness, outperforming existing methods.

Contribution

It proposes a novel correctness-aware advantage shaping method focusing on low-entropy segments, enhancing reasoning performance in LLMs beyond traditional high-entropy exploration techniques.

Findings

01

LESS improves accuracy across multiple benchmarks

02

It enhances robustness of reasoning models

03

It outperforms strong RL baselines on math tasks

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has become a central approach for improving the reasoning ability of large language models. Recent work studies RLVR through token entropy, arguing that high-entropy tokens drive exploration and should receive stronger updates. However, they overlook the fact that most of a reasoning trajectory consists of low-entropy segments that encode stable and reusable structural patterns. Through qualitative and quantitative analyses, we find that the overlap of low-entropy segments across correct responses strongly correlates with model accuracy, while overlaps involving incorrect responses exhibit stable but unproductive patterns. Motivated by these findings, we propose LESS, a correctness-aware reinforcement framework that performs fine-grained advantage modulation over low-entropy segments. LESS amplifies segments unique to correct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications