Sanity Checks for Long-Form Hallucination Detection

Geigh Zollicoffer,Minh Vu,Hongli Zhan,Raymond Li,Manish Bhattarai

arXiv:2605.08346·cs.CL·May 12, 2026

Sanity Checks for Long-Form Hallucination Detection

Geigh Zollicoffer,Minh Vu,Hongli Zhan,Raymond Li,Manish Bhattarai

PDF

TL;DR

This paper introduces a methodology to distinguish whether hallucination detection methods for language models rely on reasoning structures or surface answer cues, revealing that simple lexical features can be effective.

Contribution

It proposes controlled-invariance tests to evaluate hallucination detectors and introduces TRACT, a lightweight lexical feature-based scorer that is robust and competitive.

Findings

01

Controlled-invariance tests reveal reliance on answer artifacts.

02

TRACT achieves strong robustness with simple lexical features.

03

Effective detection does not necessarily require complex models.

Abstract

Hallucination detection methods for large language models increasingly operate on chain-of-thought reasoning traces, yet it remains unclear whether they evaluate the reasoning itself or merely exploit surface correlates of the final answer. We introduce a controlled-invariance methodology that exposes this distinction through two oracle tests: \textsc{Force}, which replaces each response's final answer with the ground truth while preserving the reasoning trace, and \textsc{Remove}, which strips answer-announcement steps while leaving the trajectory intact. This reveals if their predictive power derives from answer-level artifacts rather than from the structure or validity of intermediate reasoning. We further show that once these artifacts are controlled for, effective detection does not necessarily require complex learned representations: TRACT, a lightweight scorer built on lexical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.