RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection

Yiming Huang; Junyan Zhang; Zihao Wang; Biquan Bie; Yunzhong Qiu; Xuming Hu; Yi R. Fung; Xinlei He

arXiv:2505.15386·cs.CL·February 3, 2026

RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection

Yiming Huang, Junyan Zhang, Zihao Wang, Biquan Bie, Yunzhong Qiu, Xuming Hu, Yi R. Fung, Xinlei He

PDF

Open Access

TL;DR

RePPL introduces a novel method to improve hallucination detection in large language models by recalibrating uncertainty measurements based on semantic propagation and language generation, providing explainable token-level insights.

Contribution

It proposes RePPL, a new approach that recalibrates uncertainty scores for better hallucination detection and explanation in LLMs, addressing limitations of previous uncertainty-based methods.

Findings

01

Achieves an average AUC of 0.833 across QA datasets.

02

Provides token-level uncertainty scores as explanations.

03

Outperforms previous methods in hallucination detection.

Abstract

Large Language Models (LLMs) have become powerful, but hallucinations remain a vital obstacle to their trustworthy use. Previous works improved the capability of hallucination detection by measuring uncertainty. But they can not explain the provenance behind why hallucinations occur, particularly in identifying which part of the inputs tends to trigger hallucinations. Recent works on the prompt attack indicate that uncertainty exists in semantic propagation, where attention mechanisms gradually fuse local token information into high-level semantics across layers. Meanwhile, uncertainty also emerges in language generation, due to its probability-based selection of high-level semantics for sampled generations. Based on that, we propose RePPL to recalibrate uncertainty measurement by these two aspects, which dispatches explainable uncertainty scores to each token and aggregates in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications

MethodsSoftmax · Attention Is All You Need