MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs
Gabrielle Kaili-May Liu, Gal Yona, Avi Caciularu, Idan Szpektor, Tim G. J. Rudner, Arman Cohan

TL;DR
This paper investigates how well large language models communicate their uncertainty truthfully, finds they often fail, and introduces MetaFaith, a new prompt-based method that significantly improves their faithful uncertainty expression.
Contribution
The paper is the first systematic study of faithful uncertainty calibration in LLMs and proposes MetaFaith, a novel prompt-based approach inspired by human metacognition, to improve calibration.
Findings
LLMs largely fail at faithful uncertainty expression.
Standard prompts provide marginal improvements.
MetaFaith improves calibration by up to 61% and achieves 83% human-judged accuracy.
Abstract
A critical component in the trustworthiness of LLMs is reliable uncertainty communication, yet LLMs often use assertive language when conveying false claims, leading to over-reliance and eroded trust. We present the first systematic study of of LLMs, benchmarking models' ability to use linguistic expressions of uncertainty that their intrinsic uncertainty, across a comprehensive array of models, datasets, and prompting strategies. Our results demonstrate that LLMs largely fail at this task, and that existing interventions are insufficient: standard prompt approaches provide only marginal gains, and existing, factuality-based calibration techniques can even harm faithful calibration. To address this critical gap, we introduce MetaFaith, a novel prompt-based calibration approach inspired by human metacognition. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
