From Calibration to Collaboration: LLM Uncertainty Quantification Should Be More Human-Centered

Siddartha Devic; Tejas Srinivasan; Jesse Thomason; Willie Neiswanger; Vatsal Sharan

arXiv:2506.07461·cs.CL·June 10, 2025

From Calibration to Collaboration: LLM Uncertainty Quantification Should Be More Human-Centered

Siddartha Devic, Tejas Srinivasan, Jesse Thomason, Willie Neiswanger, Vatsal Sharan

PDF

Open Access

TL;DR

This paper critiques current LLM uncertainty quantification methods, emphasizing the need for human-centered evaluation and practices to improve real-world decision-making support.

Contribution

It identifies key limitations in existing LLM UQ practices and proposes user-centric research directions for more effective human-AI collaboration.

Findings

01

Current benchmarks lack ecological validity

02

Most methods consider only epistemic uncertainty

03

Optimizing non-user-centric metrics hampers real-world utility

Abstract

Large Language Models (LLMs) are increasingly assisting users in the real world, yet their reliability remains a concern. Uncertainty quantification (UQ) has been heralded as a tool to enhance human-LLM collaboration by enabling users to know when to trust LLM predictions. We argue that current practices for uncertainty quantification in LLMs are not optimal for developing useful UQ for human users making decisions in real-world tasks. Through an analysis of 40 LLM UQ methods, we identify three prevalent practices hindering the community's progress toward its goal of benefiting downstream users: 1) evaluating on benchmarks with low ecological validity; 2) considering only epistemic uncertainty; and 3) optimizing metrics that are not necessarily indicative of downstream utility. For each issue, we propose concrete user-centric practices and research directions that LLM UQ researchers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Explainable Artificial Intelligence (XAI)