Towards Harmonized Uncertainty Estimation for Large Language Models

Rui Li; Jing Long; Muge Qi; Heming Xia; Lei Sha; Peiyi Wang; Zhifang Sui

arXiv:2505.19073·cs.CL·July 22, 2025

Towards Harmonized Uncertainty Estimation for Large Language Models

Rui Li, Jing Long, Muge Qi, Heming Xia, Lei Sha, Peiyi Wang, Zhifang Sui

PDF

Open Access 1 Video

TL;DR

This paper introduces CUE, a simple and effective method for improving uncertainty estimation in large language models by training a lightweight corrector, leading to significant accuracy gains across various models and tasks.

Contribution

It proposes CUE, a novel lightweight correction approach that enhances the calibration and reliability of uncertainty scores in LLMs, addressing limitations of previous methods.

Findings

01

CUE achieves up to 60% improvement over existing uncertainty estimation methods.

02

The method is effective across diverse models and tasks.

03

Empirical results demonstrate better calibration and reliability.

Abstract

To facilitate robust and trustworthy deployment of large language models (LLMs), it is essential to quantify the reliability of their generations through uncertainty estimation. While recent efforts have made significant advancements by leveraging the internal logic and linguistic features of LLMs to estimate uncertainty scores, our empirical analysis highlights the pitfalls of these methods to strike a harmonized estimation between indication, balance, and calibration, which hinders their broader capability for accurate uncertainty estimation. To address this challenge, we propose CUE (Corrector for Uncertainty Estimation): A straightforward yet effective method that employs a lightweight model trained on data aligned with the target LLM's performance to adjust uncertainty scores. Comprehensive experiments across diverse models and tasks demonstrate its effectiveness, which achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Towards Harmonized Uncertainty Estimation for Large Language Models· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling