The Semantic Illusion: Certified Limits of Embedding-Based Hallucination Detection in RAG Systems
Debu Sinha

TL;DR
This paper investigates the limits of embedding-based hallucination detection in RAG systems, revealing a fundamental challenge where current methods fail on real hallucinations, but GPT-4 can effectively identify them through reasoning.
Contribution
The study introduces a conformal prediction framework for hallucination detection and uncovers the 'Semantic Illusion' phenomenon, showing the gap between surface-level detection and reasoning-based understanding.
Findings
Embedding methods achieve 95% coverage with 0% FPR on synthetic hallucinations.
Embedding methods fail catastrophically on real hallucinations, with 100% FPR.
GPT-4 achieves 7% FPR, demonstrating reasoning can overcome surface-level detection limits.
Abstract
Retrieval-Augmented Generation (RAG) systems remain susceptible to hallucinations despite grounding in retrieved evidence. While current detection methods leverage embedding similarity and natural language inference (NLI), their reliability in safety-critical settings remains unproven. We apply conformal prediction to RAG hallucination detection, transforming heuristic scores into decision sets with finite-sample coverage guarantees (1-alpha). Using calibration sets of n=600, we demonstrate a fundamental dichotomy: on synthetic hallucinations (Natural Questions), embedding methods achieve 95% coverage with 0% False Positive Rate (FPR). However, on real hallucinations from RLHF-aligned models (HaluEval), the same methods fail catastrophically, yielding 100% FPR at target coverage. We analyze this failure through the lens of distributional tails, showing that while NLI models achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Misinformation and Its Impacts
