TL;DR
This paper introduces G-NLL, a computationally efficient uncertainty measure for LLMs based on the negative log-likelihood of the most likely output, offering state-of-the-art performance with theoretical rigor.
Contribution
It proposes a new uncertainty estimation method, G-NLL, that is simpler and more efficient than existing multi-sequence approaches, grounded in proper scoring rules.
Findings
G-NLL achieves state-of-the-art uncertainty estimation performance.
Theoretical analysis shows negative log-likelihood of the top sequence is a principled measure.
G-NLL requires only a single greedy decoding, reducing computational costs.
Abstract
Large Language Models (LLMs) are increasingly employed in real-world applications, driving the need to evaluate the trustworthiness of their generated text. To this end, reliable uncertainty estimation is essential. Leading uncertainty estimation methods generate and analyze multiple output sequences, which is computationally expensive and impractical at scale. In this work, we inspect the theoretical foundations of these methods and explore new directions to enhance computational efficiency. Building on the framework of proper scoring rules, we find that the negative log-likelihood of the most likely output sequence constitutes a theoretically principled uncertainty measure. To approximate this alternative measure, we propose G-NLL, obtained using a single output sequence from greedy decoding. This approach streamlines uncertainty estimation while preserving theoretical rigor.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
