The Consistency Hypothesis in Uncertainty Quantification for Large Language Models

Quan Xiao; Debarun Bhattacharjya; Balaji Ganesan; Radu Marinescu; Katsiaryna Mirylenka; Nhan H Pham; Michael Glass; Junkyu Lee

arXiv:2506.21849·cs.CL·June 30, 2025

The Consistency Hypothesis in Uncertainty Quantification for Large Language Models

Quan Xiao, Debarun Bhattacharjya, Balaji Ganesan, Radu Marinescu, Katsiaryna Mirylenka, Nhan H Pham, Michael Glass, Junkyu Lee

PDF

Open Access

TL;DR

This paper investigates the assumption that consistency in LLM outputs can serve as a reliable measure of confidence, proposing new statistical tests and data-free methods that improve uncertainty estimation across various tasks.

Contribution

It formalizes the consistency hypothesis, introduces statistical tests and metrics for evaluation, and develops data-free UQ methods that leverage output similarities for better confidence estimation.

Findings

01

The consistency hypothesis is prevalent across multiple tasks and datasets.

02

The 'Sim-Any' hypothesis is particularly effective for confidence estimation.

03

Data-free methods based on output similarity can outperform existing baselines.

Abstract

Estimating the confidence of large language model (LLM) outputs is essential for real-world applications requiring high user trust. Black-box uncertainty quantification (UQ) methods, relying solely on model API access, have gained popularity due to their practical benefits. In this paper, we examine the implicit assumption behind several UQ methods, which use generation consistency as a proxy for confidence, an idea we formalize as the consistency hypothesis. We introduce three mathematical statements with corresponding statistical tests to capture variations of this hypothesis and metrics to evaluate LLM output conformity across tasks. Our empirical investigation, spanning 8 benchmark datasets and 3 tasks (question answering, text summarization, and text-to-SQL), highlights the prevalence of the hypothesis under different settings. Among the statements, we highlight the `Sim-Any'…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)