Why Don't You Know? Evaluating the Impact of Uncertainty Sources on Uncertainty Quantification in LLMs
Maiya Goloburda, Roman Vashurin, Fedor Chernogorsky, Nurkhan Laiyk, Daniil Orel, Preslav Nakov, Maxim Panov

TL;DR
This paper investigates how different sources of uncertainty affect the performance of uncertainty quantification methods in large language models, highlighting the need for source-aware approaches.
Contribution
Introduces a new dataset categorizing uncertainty sources and systematically evaluates existing UQ methods under these conditions.
Findings
UQ methods perform well when uncertainty is from knowledge gaps
Performance degrades with other uncertainty sources
Highlights the need for source-aware uncertainty methods
Abstract
As Large Language Models (LLMs) are increasingly deployed in real-world applications, reliable uncertainty quantification (UQ) becomes critical for safe and effective use. Most existing UQ approaches for language models aim to produce a single confidence score -- for example, estimating the probability that a model's answer is correct. However, uncertainty in natural language tasks arises from multiple distinct sources, including model knowledge gaps, output variability, and input ambiguity, which have different implications for system behavior and user interaction. In this work, we study how the source of uncertainty impacts the behavior and effectiveness of existing UQ methods. To enable controlled analysis, we introduce a new dataset that explicitly categorizes uncertainty sources, allowing systematic evaluation of UQ performance under each condition. Our experiments reveal that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
