Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection
Yihao Xue, Kristjan Greenewald, Youssef Mroueh, Baharan Mirzasoleiman

TL;DR
This paper investigates hallucination detection in large language models, showing limitations of self-consistency methods and proposing a cross-model, cost-effective detection approach with theoretical insights.
Contribution
It introduces a cross-model consistency checking method and a two-stage detection algorithm that reduces costs while improving detection accuracy.
Findings
Cross-model checking outperforms self-consistency alone.
The two-stage method reduces computational costs significantly.
Theoretical analysis provides new geometric insights into detection methods.
Abstract
Large Language Models (LLMs) suffer from hallucination problems, which hinder their reliability in sensitive applications. In the black-box setting, several self-consistency-based techniques have been proposed for hallucination detection. We empirically study these techniques and show that they achieve performance close to that of a supervised (still black-box) oracle, suggesting little room for improvement within this paradigm. To address this limitation, we explore cross-model consistency checking between the target model and an additional verifier LLM. With this extra information, we observe improved oracle performance compared to purely self-consistency-based methods. We then propose a budget-friendly, two-stage detection algorithm that calls the verifier model only for a subset of cases. It dynamically switches between self-consistency and cross-consistency based on an uncertainty…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHallucinations in medical conditions
