TL;DR
This paper addresses the instability of uncertainty estimation metrics in large language models by proposing a post-hoc calibration method called Truth Anchoring (TAC) to improve reliability and truth-alignment.
Contribution
The paper introduces TAC, a novel calibration technique that enhances the reliability of UE metrics in LLMs, especially in low-information regimes, with a practical calibration protocol.
Findings
TAC improves the calibration of uncertainty estimates in LLMs.
UE metrics become non-discriminative in low-information regimes without calibration.
The code for TAC is publicly available at the provided GitHub link.
Abstract
Uncertainty estimation (UE) aims to detect hallucinated outputs of large language models (LLMs) to improve their reliability. However, UE metrics often exhibit unstable performance across configurations, which significantly limits their applicability. In this work, we formalise this phenomenon as proxy failure, since most UE metrics originate from model behaviour, rather than being explicitly grounded in the factual correctness of LLM outputs. With this, we show that UE metrics become non-discriminative precisely in low-information regimes. To alleviate this, we propose Truth AnChoring (TAC), a post-hoc calibration method to remedy UE metrics, by mapping the raw scores to truth-aligned scores. Even with noisy and few-shot supervision, our TAC can support the learning of well-calibrated uncertainty estimates, and presents a practical calibration protocol. Our findings highlight the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
