From Out-of-Distribution Detection to Hallucination Detection: A Geometric View
Litian Liu, Reza Pourreza, Yubing Jian, Yao Qin, Roland Memisevic

TL;DR
This paper proposes a novel approach to hallucination detection in large language models by framing it as an out-of-distribution detection problem, enabling training-free and scalable safety solutions.
Contribution
It introduces a geometric perspective that adapts OOD detection techniques to language models, improving hallucination detection especially in reasoning tasks.
Findings
OOD-based detectors achieve high accuracy in hallucination detection
Training-free, single-sample detection methods are effective
Reframing hallucination detection as OOD detection enhances scalability and safety
Abstract
Detecting hallucinations in large language models is a critical open problem with significant implications for safety and reliability. While existing hallucination detection methods achieve strong performance in question-answering tasks, they remain less effective on tasks requiring reasoning. In this work, we revisit hallucination detection through the lens of out-of-distribution (OOD) detection, a well-studied problem in areas like computer vision. Treating next-token prediction in language models as a classification task allows us to apply OOD techniques, provided appropriate modifications are made to account for the structural differences in large language models. We show that OOD-based approaches yield training-free, single-sample-based detectors, achieving strong accuracy in hallucination detection for reasoning tasks. Overall, our work suggests that reframing hallucination…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Misinformation and Its Impacts · Mental Health via Writing
