The Semantic Illusion: Certified Limits of Embedding-Based Hallucination Detection in RAG Systems

Debu Sinha

arXiv:2512.15068·cs.LG·December 22, 2025

The Semantic Illusion: Certified Limits of Embedding-Based Hallucination Detection in RAG Systems

Debu Sinha

PDF

Open Access

TL;DR

This paper investigates the limits of embedding-based hallucination detection in RAG systems, revealing a fundamental challenge where current methods fail on real hallucinations, but GPT-4 can effectively identify them through reasoning.

Contribution

The study introduces a conformal prediction framework for hallucination detection and uncovers the 'Semantic Illusion' phenomenon, showing the gap between surface-level detection and reasoning-based understanding.

Findings

01

Embedding methods achieve 95% coverage with 0% FPR on synthetic hallucinations.

02

Embedding methods fail catastrophically on real hallucinations, with 100% FPR.

03

GPT-4 achieves 7% FPR, demonstrating reasoning can overcome surface-level detection limits.

Abstract

Retrieval-Augmented Generation (RAG) systems remain susceptible to hallucinations despite grounding in retrieved evidence. While current detection methods leverage embedding similarity and natural language inference (NLI), their reliability in safety-critical settings remains unproven. We apply conformal prediction to RAG hallucination detection, transforming heuristic scores into decision sets with finite-sample coverage guarantees (1-alpha). Using calibration sets of n=600, we demonstrate a fundamental dichotomy: on synthetic hallucinations (Natural Questions), embedding methods achieve 95% coverage with 0% False Positive Rate (FPR). However, on real hallucinations from RLHF-aligned models (HaluEval), the same methods fail catastrophically, yielding 100% FPR at target coverage. We analyze this failure through the lens of distributional tails, showing that while NLI models achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Misinformation and Its Impacts