Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations

Jinyuan Luo; Zhen Fang; Yixuan Li; Seongheon Park; Ling Chen

arXiv:2506.02696·cs.AI·June 4, 2025

Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations

Jinyuan Luo, Zhen Fang, Yixuan Li, Seongheon Park, Ling Chen

PDF

Open Access

TL;DR

This paper introduces Sample-Specific Prompting (SSP), a novel perturbation-based framework that enhances the detection of hallucinations in large language models by analyzing intermediate representations, leading to more reliable self-assessment.

Contribution

The paper proposes SSP, a new method that improves hallucination detection by focusing on intermediate model representations and their sensitivity to perturbations, surpassing previous confidence-based approaches.

Findings

01

SSP outperforms existing hallucination detection methods on multiple benchmarks.

02

Analyzing intermediate representations provides a more faithful signal for factual accuracy.

03

Perturbation sensitivity correlates with the likelihood of hallucination in LLM outputs.

Abstract

Hallucination remains a key obstacle to the reliable deployment of large language models (LLMs) in real-world question answering tasks. A widely adopted strategy to detect hallucination, known as self-assessment, relies on the model's own output confidence to estimate the factual accuracy of its answers. However, this strategy assumes that the model's output distribution closely reflects the true data distribution, which may not always hold in practice. As bias accumulates through the model's layers, the final output can diverge from the underlying reasoning process, making output-level confidence an unreliable signal for hallucination detection. In this work, we propose Sample-Specific Prompting (SSP), a new framework that improves self-assessment by analyzing perturbation sensitivity at intermediate representations. These representations, being less influenced by model bias, offer a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHallucinations in medical conditions · Autoimmune Neurological Disorders and Treatments · Drug-Induced Ocular Toxicity