Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning
Qiannian Zhao, Chen Yang, Jinhao Jing, Yunke Zhang, Xuhui Ren, Lu Yu, Shijie Zhang, Hongzhi Yin

TL;DR
This paper introduces EGPO, a framework that incorporates model uncertainty into reinforcement learning for reasoning tasks, leading to improved reasoning accuracy by calibrating entropy and addressing the uncertainty-reward mismatch.
Contribution
EGPO is a novel entropy calibration method that explicitly integrates intrinsic uncertainty into RLVR, enhancing reasoning performance without modifying the verifier or reward structure.
Findings
EGPO improves reasoning accuracy across multiple benchmarks.
Uncertainty calibration stabilizes policy optimization in reasoning tasks.
The method effectively distinguishes high- and low-uncertainty solutions.
Abstract
Large reasoning models (LRMs) have emerged as a powerful paradigm for solving complex real-world tasks. In practice, these models are predominantly trained via Reinforcement Learning with Verifiable Rewards (RLVR), yet most existing outcome-only RLVR pipelines rely almost exclusively on a binary correctness signal and largely ignore the model's intrinsic uncertainty. We term this discrepancy the uncertainty-reward mismatch, under which high- and low-uncertainty solutions are treated equivalently, preventing the policy from "Know What You Know" and impeding the shift from optimizing for correct answers to optimizing effective reasoning paths. This limitation is especially critical in reasoning-centric tasks such as mathematics and question answering, where performance hinges on the quality of the model's internal reasoning process rather than mere memorization of final answers. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
