Hallucinations Undermine Trust; Metacognition is a Way Forward
Gal Yona, Mor Geva, Yossi Matias

TL;DR
The paper discusses how hallucinations in generative AI undermine trust and proposes metacognition, specifically faithful uncertainty, as a solution to improve reliability and distinguish known from unknown information.
Contribution
It introduces the concept of faithful uncertainty as a form of metacognition to help LLMs better identify and communicate their own uncertainty, reducing hallucinations.
Findings
Hallucinations are often confident errors without proper qualification.
Expanding knowledge boundaries alone does not improve factual accuracy.
Faithful uncertainty aligns linguistic expressions with intrinsic model uncertainty.
Abstract
Despite significant strides in factual reliability, errors -- often termed hallucinations -- remain a major concern for generative AI, especially as LLMs are increasingly expected to be helpful in more complex or nuanced setups. Yet even in the simplest setting -- factoid question-answering with clear ground truth-frontier models without external tools continue to hallucinate. We argue that most factuality gains in this domain have come from expanding the model's knowledge boundary (encoding more facts) rather than improving awareness of that boundary (distinguishing known from unknown). We conjecture that the latter is inherently difficult: models may lack the discriminative power to perfectly separate truths from errors, creating an unavoidable tradeoff between eliminating hallucinations and preserving utility. This tradeoff dissolves under a different framing. If we understand…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
