Principled Detection of Hallucinations in Large Language Models via Multiple Testing
Jiawei Li, Akshayaa Magesh, Venugopal V. Veeravalli

TL;DR
This paper introduces a statistically principled method for detecting hallucinations in large language models by framing it as a hypothesis testing problem and using conformal p-values for calibrated detection.
Contribution
It proposes a novel multiple-testing-based approach that improves robustness and reliability of hallucination detection across different models and datasets.
Findings
The method achieves better false alarm control than existing detectors.
Experiments show robustness across diverse models and datasets.
The approach provides calibrated detection with theoretical guarantees.
Abstract
While Large Language Models (LLMs) have emerged as powerful foundational models to solve a variety of tasks, they have also been shown to be prone to hallucinations, i.e., generating responses that sound confident but are actually incorrect or even nonsensical. Existing hallucination detectors propose a wide range of empirical scoring rules, but their performance varies across models and datasets, and it is hard to determine which ones to rely on in practice or to treat as a reliable detector. In this work, we formulate the problem of detecting hallucinations as a hypothesis testing problem and draw parallels with the problem of out-of-distribution detection in machine learning models. We then propose a multiple-testing-inspired method that systematically aggregates multiple evaluation scores via conformal p-values, enabling calibrated detection with controlled false alarm rate.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
