Entropy Centroids as Intrinsic Rewards for Test-Time Scaling

Wenshuo Zhao; Qi Zhu; Xingshan Zeng; Fei Mi; Lifeng Shang; Yi R. (May) Fung

arXiv:2604.26173·cs.LG·May 4, 2026

Entropy Centroids as Intrinsic Rewards for Test-Time Scaling

Wenshuo Zhao, Qi Zhu, Xingshan Zeng, Fei Mi, Lifeng Shang, Yi R. (May) Fung

PDF

1 Repo

TL;DR

This paper introduces a novel intrinsic reward based on entropy centroids, which clusters high-entropy tokens to better assess model uncertainty during inference, improving response selection in large language models.

Contribution

It proposes the entropy centroid as a new measure of model uncertainty, enabling more stable and effective response selection without external reward models.

Findings

01

Lowest Centroid method outperforms existing baselines across tasks.

02

Stable performance gains increase with larger model sizes.

03

Entropy centroid correlates with higher response quality.

Abstract

An effective way to scale up test-time compute of large language models is to sample multiple responses and then select the best one, as in Grok Heavy and Gemini Deep Think. Existing selection methods often rely on external reward models, which requires training a strong reward model and introduces additional computation overhead. As an alternative, previous approaches have explored intrinsic signals, such as confidence and entropy, but these signals are noisy with naive aggregation. In this work, we observe that high-entropy tokens tend to cluster into consecutive groups during inference, providing a more stable notion of model uncertainty than individual tokens. Together, these clusters reveal temporal patterns of model uncertainty throughout the inference process. Motivated by this observation, we propose to use the temporal structure of uncertainty as an intrinsic reward. To this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hkust-nlp/entropy-centroid
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.