A statistically consistent measure of semantic uncertainty using Language Models
Yi Liu

TL;DR
This paper introduces semantic spectral entropy, a new statistically consistent measure of semantic uncertainty in language models that is easy to compute and broadly applicable, providing accurate uncertainty estimates without internal model access.
Contribution
The paper presents a novel, simple algorithm for measuring semantic uncertainty that is statistically consistent and compatible with standard pretrained language models.
Findings
Accurately estimates semantic uncertainty across different models
Robust against inherent randomness in language model outputs
Applicable without access to internal generation processes
Abstract
To address the challenge of quantifying uncertainty in the outputs generated by language models, we propose a novel measure of semantic uncertainty, semantic spectral entropy, that is statistically consistent under mild assumptions. This measure is implemented through a straightforward algorithm that relies solely on standard, pretrained language models, without requiring access to the internal generation process. Our approach imposes minimal constraints on the choice of language models, making it broadly applicable across different architectures and settings. Through comprehensive simulation studies, we demonstrate that the proposed method yields an accurate and robust estimate of semantic uncertainty, even in the presence of the inherent randomness characteristic of generative language model outputs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
