LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information
Stef van Buuren

TL;DR
This paper evaluates how large language models' uncertainty estimates should increase with missing information, finding entropy-based measures better reflect uncertainty than confidence in incomplete contexts.
Contribution
It introduces a framework for assessing LLM uncertainty under incomplete context and demonstrates entropy as a more reliable measure than confidence, with a new diagnostic tool.
Findings
Entropy increases with context removal, aligning with the multiple imputation analogy.
Confidence remains high despite accuracy drops, indicating it is less responsive to missing information.
The proposed diagnostic $ ho_R( ext{alpha})$ effectively estimates resolved uncertainty from repeated sampling.
Abstract
Large language models (LLMs) are increasingly deployed in settings where the available context is incomplete or degraded. We argue that an LLM generating answers under incomplete context can be viewed as an implicit imputer, and evaluated against a criterion from the multiple imputation (MI) literature: uncertainty should scale with the amount of missing information. We assess this criterion on SQuAD, using a controlled framework in which context availability is varied across five levels. We evaluate two answer-level uncertainty measures that can be estimated from repeated sampling: sampling-based confidence (empirical mode frequency) and response entropy. Confidence fails to reflect increasing missingness: it remains high even as accuracy collapses. Entropy, by contrast, increases with context removal, consistent with the MI analogy, and explains substantially more variance in accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
