From Entropy to Calibrated Uncertainty: Training Language Models to Reason About Uncertainty
Azza Jenane, Nassim Walha, Lukas Kuhn, Florian Buettner

TL;DR
This paper introduces a three-stage post-training pipeline for large language models to produce calibrated, interpretable uncertainty estimates efficiently, improving their reliability in high-stakes applications.
Contribution
It presents a novel post-training method combining entropy-based scoring, calibration, and reinforcement learning to enhance LLM uncertainty estimation.
Findings
Models achieve better calibration than baselines.
Uncertainty estimates are interpretable and computationally efficient.
Models generalize to unseen tasks without additional training.
Abstract
Large Language Models (LLMs) that can express interpretable and calibrated uncertainty are crucial in high-stakes domains. While methods to compute uncertainty post-hoc exist, they are often sampling-based and therefore computationally expensive or lack calibration. We propose a three-stage pipeline to post-train LLMs to efficiently infer calibrated uncertainty estimates for their responses. First, we compute fine-grained entropy-based uncertainty scores on the training data, capturing the distributional variability of model outputs in embedding space. Second, these scores are calibrated via Platt scaling, producing reliable and human-interpretable uncertainty signals. Finally, the target LLM is post-trained via reinforcement learning to align its policy with these calibrated signals through a verifiable reward function. Unlike post-hoc uncertainty estimation methods, our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications
