TL;DR
This paper introduces CLUE, a framework that explains model uncertainty in fact-checking by identifying conflicting or agreeing evidence spans, improving interpretability and trust.
Contribution
CLUE is the first method to generate natural language explanations of uncertainty by highlighting evidence interactions without requiring fine-tuning.
Findings
CLUE produces more faithful and consistent explanations than baseline prompting.
Human evaluators find CLUE explanations more helpful and logically coherent.
CLUE generalizes to other tasks requiring complex reasoning over evidence.
Abstract
Understanding sources of a model's uncertainty regarding its predictions is crucial for effective human-AI collaboration. Prior work proposes using numerical uncertainty or hedges ("I'm not sure, but ..."), which do not explain uncertainty that arises from conflicting evidence, leaving users unable to resolve disagreements or rely on the output. We introduce CLUE (Conflict-and-Agreement-aware Language-model Uncertainty Explanations), the first framework to generate natural language explanations of model uncertainty by (i) identifying relationships between spans of text that expose claim-evidence or inter-evidence conflicts and agreements that drive the model's predictive uncertainty in an unsupervised way, and (ii) generating explanations via prompting and attention steering that verbalize these critical interactions. Across three language models and two fact-checking datasets, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
