The Energy of Falsehood: Detecting Hallucinations via Diffusion Model Likelihoods

Arpit Singh Gautam; Kailash Talreja; Saurabh Jha

arXiv:2602.11364·cs.CL·February 13, 2026

The Energy of Falsehood: Detecting Hallucinations via Diffusion Model Likelihoods

Arpit Singh Gautam, Kailash Talreja, Saurabh Jha

PDF

Open Access

TL;DR

This paper introduces DiffuTruth, an unsupervised method using diffusion model likelihoods and thermodynamic principles to detect hallucinations in large language models by measuring semantic energy and stability, improving factual accuracy detection.

Contribution

It presents a novel thermodynamics-inspired framework and metrics for fact verification, outperforming existing methods in unsupervised hallucination detection and zero-shot generalization.

Findings

01

Achieves state-of-the-art AUROC of 0.725 on FEVER

02

Outperforms baselines by 1.5% in AUROC

03

Outperforms baselines by over 4% on HOVER dataset

Abstract

Large Language Models (LLMs) frequently hallucinate plausible but incorrect assertions, a vulnerability often missed by uncertainty metrics when models are confidently wrong. We propose DiffuTruth, an unsupervised framework that reconceptualizes fact verification via non equilibrium thermodynamics, positing that factual truths act as stable attractors on a generative manifold while hallucinations are unstable. We introduce the Generative Stress Test, claims are corrupted with noise and reconstructed using a discrete text diffusion model. We define Semantic Energy, a metric measuring the semantic divergence between the original claim and its reconstruction using an NLI critic. Unlike vector space errors, Semantic Energy isolates deep factual contradictions. We further propose a Hybrid Calibration fusing this stability signal with discriminative confidence. Extensive experiments on FEVER…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Misinformation and Its Impacts · Generative Adversarial Networks and Image Synthesis