Why Semantic Entropy Fails: Geometry-Aware and Calibrated Uncertainty for Policy Optimization

Zheyuan Zhang; Kaiwen Shi; Han Bao; Zehong Wang; Tianyi Ma; Yanfang Ye

arXiv:2605.21801·cs.LG·May 22, 2026

Why Semantic Entropy Fails: Geometry-Aware and Calibrated Uncertainty for Policy Optimization

Zheyuan Zhang, Kaiwen Shi, Han Bao, Zehong Wang, Tianyi Ma, Yanfang Ye

PDF

TL;DR

This paper introduces GCPO, a geometry-aware and calibrated uncertainty framework for policy optimization that improves post-training performance by better characterizing gradient variance and learning signals.

Contribution

It provides the first principled formulation of uncertainty signals as regulators of gradient variance, addressing gaps in entropy-based estimators with a novel geometry-aware and calibrated approach.

Findings

01

GCPO more accurately tracks gradient variability.

02

GCPO consistently improves post-training performance.

03

The approach offers a principled perspective for robust post-training.

Abstract

Post-training has become central to improving reasoning and alignment in large language models, where critic-free models enable scalable learning from model-generated outputs but lack principled mechanisms to distinguish informative from noisy signals. Recent approaches leverage response-level measures as uncertainty signals to regulate group-based optimization methods such as GRPO. Yet their empirical success remains unstable and unclear in how they influence optimization dynamics. In this paper, we provide, to our knowledge, the first principled formulation that interprets uncertainty signals as mechanisms for characterizing and regulating gradient variance and learning signal quality. Based on both empirical and theoretical analysis, we identify two critical gaps of current entropy-based estimators: The anisotropic gap and The calibration gap. Motivated by this analysis, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.