PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts

Khizar Hussain; Murat Kantarcioglu

arXiv:2605.17028·cs.CL·May 19, 2026

PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts

Khizar Hussain, Murat Kantarcioglu

PDF

TL;DR

This paper critically examines hallucination detection in large language models, revealing that many reported successes are due to dataset artifacts and introducing DRIFT as a more reliable detection method.

Contribution

The study exposes benchmark artifacts affecting hallucination detection evaluation and proposes DRIFT, a supervised probe, as a more genuine detection approach.

Findings

01

Most detection success is due to dataset artifacts, not model understanding.

02

Many established baselines perform near chance when artifacts are controlled.

03

DRIFT and SAPLMA are effective supervised probes for detection.

Abstract

Large language models (LLMs) hallucinate with confidence: their outputs can be fluent, authoritative, and simply wrong. In medical, legal, and scientific applications this failure causes direct harm, and detecting it from internal model states offers a path to safer deployment. A growing body of work reports that this problem is increasingly tractable, with recent methods achieving high detection performance on widely used benchmarks. We show, however, that much of this apparent progress does not survive scrutiny. Four of the six corpora embed the ground-truth answer directly in the input prompt. A na\"{i}ve text-similarity baseline we call \textsc{TxTemb} exploits this to achieve near-perfect detection scores without any access to model internals. To measure what genuine detection capability remains once these artifacts are controlled, we conduct a large-scale evaluation spanning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.