DRIFT: Detecting Representational Inconsistencies for Factual Truthfulness

Rohan Bhatnagar; Youran Sun; Chi Andrew Zhang; Yixin Wen; Haizhao Yang

arXiv:2601.14210·cs.CL·January 30, 2026

DRIFT: Detecting Representational Inconsistencies for Factual Truthfulness

Rohan Bhatnagar, Youran Sun, Chi Andrew Zhang, Yixin Wen, Haizhao Yang

PDF

Open Access

TL;DR

DRIFT introduces a lightweight, parallelizable probe that detects factual hallucinations in LLMs by reading confidence signals from hidden states, enabling faster and more accurate truthfulness verification.

Contribution

The paper presents a novel, low-overhead probing method to detect hallucinations in LLMs and an LLM router that improves response confidence handling without retraining.

Findings

01

Achieves state-of-the-art AUROC on 10 out of 12 settings.

02

Gains up to 13 points over prior methods.

03

Generalizes across dataset shifts without retraining.

Abstract

LLMs often produce fluent but incorrect answers, yet detecting such hallucinations typically requires multiple sampling passes or post-hoc verification, adding significant latency and cost. We hypothesize that intermediate layers encode confidence signals that are lost in the final output layer, and propose a lightweight probe to read these signals directly from hidden states. The probe adds less than 0.1\% computational overhead and can run fully in parallel with generation, enabling hallucination detection before the answer is produced. Building on this, we develop an LLM router that answers confident queries immediately while delegating uncertain ones to stronger models. Despite its simplicity, our method achieves SOTA AUROC on 10 out of 12 settings across four QA benchmarks and three LLM families, with gains of up to 13 points over prior methods, and generalizes across dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Graph Neural Networks