Shapley Uncertainty in Natural Language Generation

Meilin Zhu; Gaojie Jin; Xiaowei Huang; Lijun Zhang

arXiv:2507.21406·cs.AI·July 30, 2025

Shapley Uncertainty in Natural Language Generation

Meilin Zhu, Gaojie Jin, Xiaowei Huang, Lijun Zhang

PDF

TL;DR

This paper introduces a Shapley-based uncertainty metric for large language models that better captures semantic nuances and improves prediction of model performance in question-answering tasks.

Contribution

It develops a novel Shapley uncertainty framework that extends semantic entropy, satisfying key properties and outperforming baselines in predicting LLM performance.

Findings

01

Shapley uncertainty more accurately predicts LLM performance.

02

The framework captures continuous semantic relationships.

03

It outperforms existing baseline uncertainty measures.

Abstract

In question-answering tasks, determining when to trust the outputs is crucial to the alignment of large language models (LLMs). Kuhn et al. (2023) introduces semantic entropy as a measure of uncertainty, by incorporating linguistic invariances from the same meaning. It primarily relies on setting threshold to measure the level of semantic equivalence relation. We propose a more nuanced framework that extends beyond such thresholding by developing a Shapley-based uncertainty metric that captures the continuous nature of semantic relationships. We establish three fundamental properties that characterize valid uncertainty metrics and prove that our Shapley uncertainty satisfies these criteria. Through extensive experiments, we demonstrate that our Shapley uncertainty more accurately predicts LLM performance in question-answering and other datasets, compared to similar baseline measures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.