Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

Shuyu Wei; Jian Sun; Delai Qiu; Yining Wang; Shengping Liu; Jiaen Liang; Ying Fu; Wei Huang; Jitao Sang

arXiv:2605.19358·cs.CL·May 20, 2026

Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

Shuyu Wei, Jian Sun, Delai Qiu, Yining Wang, Shengping Liu, Jiaen Liang, Ying Fu, Wei Huang, Jitao Sang

PDF

TL;DR

This paper introduces Conditional Entropy Shaping (CES), a novel framework that dynamically balances response conciseness and depth in LLM reasoning by controlling token-level entropy, leading to improved accuracy and efficiency.

Contribution

CES is a new method that adaptively manages entropy during reasoning, enhancing LLM performance on mathematical benchmarks compared to existing approaches.

Findings

01

CES improves average accuracy on 12 mathematical benchmarks.

02

CES reduces response length while maintaining or improving accuracy.

03

CES shows similar benefits on smaller and out-of-domain models.

Abstract

Entropy-based deep reasoning has emerged as a promising direction for improving the reasoning capabilities of Large Language Models (LLMs), but existing methods often either increase response length indiscriminately or shorten responses at the cost of accuracy. To better balance this trade-off, we introduce Conditional Entropy Shaping (CES), a framework that dynamically controls token-level response entropy, enabling LLMs to produce concise solutions on simple problems while encouraging deeper exploration on hard ones. Built on DAPO, CES uses token-level entropy as an uncertainty signal and applies a conditional bidirectional policy: it penalizes high-entropy "forking point" tokens on correct reasoning paths to improve conciseness, and rewards them on incorrect paths to encourage exploration and error correction. We implement CES on DeepSeek-R1-Distill-7B and evaluate it on 12…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.