Calibrating LLMs with Semantic-level Reward
Fengfei Yu, Ruijia Niu, Dongxia Wu, Yian Ma, Rose Yu

TL;DR
This paper introduces a semantic-level reward framework called CSR to improve the calibration of large language models, ensuring their confidence estimates are more reliable across various tasks and distributions.
Contribution
CSR directly calibrates LLMs in semantic space, overcoming verbalized confidence limitations and enhancing calibration metrics across multiple datasets and model families.
Findings
CSR reduces ECE by up to 40% compared to baselines.
CSR improves AUROC by up to 31% over verbalized-confidence methods.
Calibration behavior generalizes robustly across datasets and model types.
Abstract
As large language models (LLMs) are deployed in consequential settings such as medical question answering and legal reasoning, the ability to estimate when their outputs are likely to be correct is essential for safe and reliable use, requiring well-calibrated uncertainty. Standard reinforcement learning with verifiable rewards (RLVR) trains models with a binary correctness reward that is indifferent to confidence, providing no penalty for confident but wrong predictions and thereby degrading calibration. Recent work addresses this by training models to produce verbalized confidence scores alongside answers and rewarding agreement with correctness. However, verbalized confidence is calibrated at the token level and thus exhibits inconsistency across textual variations with same semantic meaning. We propose \textbf{Calibration with Semantic Reward (CSR)}, a framework that calibrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
