Variability Need Not Imply Error: The Case of Adequate but Semantically Distinct Responses
Evgenia Ilia, Wilker Aziz

TL;DR
This paper challenges the assumption that semantic variability in language model responses indicates errors, proposing a new measure called PROBAR that better estimates model reliability by annotating response adequacy.
Contribution
The authors introduce PROBAR, a novel approach that assesses model confidence by annotating response adequacy, outperforming semantic entropy in reliability estimation.
Findings
PROBAR outperforms semantic entropy in estimating model reliability.
PROBAR effectively measures confidence across ambiguous and open-ended prompts.
The approach improves selective prediction in language models.
Abstract
With the broader use of language models (LMs) comes the need to estimate their ability to respond reliably to prompts (e.g., are generated responses likely to be correct?). Uncertainty quantification tools (notions of confidence and entropy, i.a.) can be used to that end (e.g., to reject a response when the model is `uncertain'). For example, Kuhn et al. (semantic entropy; 2022b) regard semantic variation amongst sampled responses as evidence that the model `struggles' with the prompt and that the LM is likely to err. We argue that semantic variability need not imply error--this being especially intuitive in open-ended settings, where prompts elicit multiple adequate but semantically distinct responses. Hence, we propose to annotate sampled responses for their adequacy to the prompt (e.g., using a classifier) and estimate the Probability the model assigns to Adequate Responses (PROBAR),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Forecasting Techniques and Applications · Supply Chain Resilience and Risk Management
MethodsOPT
