Hallucination Begins Where Saliency Drops
Xiaofeng Zhang, Yuanchao Zhu, Chaochen Gu, Xiaosong Yuan, Qiyan Zhao, Jiawei Cao, Feilong Tang, Sinan Fan, Yaomin Shen, Chen Shen, Hao Tang

TL;DR
This paper introduces LVLMs-Saliency, a gradient-aware diagnostic framework that identifies hallucinations in large vision-language models by analyzing attention and gradient signals, and proposes two inference-time methods to reduce hallucinations effectively.
Contribution
The paper presents a novel gradient-aware saliency framework and two inference-time techniques to detect and mitigate hallucinations in LVLMs, improving reliability and interpretability.
Findings
Hallucinations occur when saliency to next tokens is low.
Saliency-guided rejection sampling reduces hallucination rates.
Local coherence reinforcement improves contextual memory.
Abstract
Recent studies have examined attention dynamics in large vision-language models (LVLMs) to detect hallucinations. However, existing approaches remain limited in reliably distinguishing hallucinated from factually grounded outputs, as they rely solely on forward-pass attention patterns and neglect gradient-based signals that reveal how token influence propagates through the network. To bridge this gap, we introduce LVLMs-Saliency, a gradient-aware diagnostic framework that quantifies the visual grounding strength of each output token by fusing attention weights with their input gradients. Our analysis uncovers a decisive pattern: hallucinations frequently arise when preceding output tokens exhibit low saliency toward the prediction of the next token, signaling a breakdown in contextual memory retention. Leveraging this insight, we propose a dual-mechanism inference-time framework to…
Peer Reviews
Decision·ICLR 2026 Oral
The proposed method in this paper is highly novel. While most previous studies have focused on leveraging attention mechanisms to reduce hallucination, this work explores the use of saliency, which represents a promising and valuable direction for further research.
1. The key finding and the basis for the proposed method in this paper is that “Hallucinations occur when prior output tokens show low saliency to the next token prediction, indicating a failure of contextual memory.” However, after reading the entire manuscript, I could not find any statistically significant validation of this claim. Figure 1 appears to be merely a case study and does not provide statistical evidence to support the finding. 2. From Table 1, the proposed method does not appear
1. The paper provides a compelling, interpretable explanation of LVLM hallucinations, showing through both empirical results and visualizations that attention-alone is insufficient and that joint attention-gradient saliency captures key failure modes (see Figures 1 and 2). 2. The proposed SGRS (Algorithm 1) and LocoRE (Algorithm 2) do not require retraining, operate at inference, and effectively use saliency to dynamically control and reinforce generation. This “plug-and-play” aspect increases
1. The SGRS component's reliance on backward passes during inference imposes significant memory constraints, limiting its applicability to models up to 13B parameters and preventing scalability to larger LVLMs like Qwen2.5-VL-32B or 72B, which undermines the method's claimed broad generalizability and real-world deployment feasibility. 2. While the paper asserts a direct causal link between low saliency and hallucinations, the evidence is primarily correlational from observational patterns in f
The paper offers a novel and well-motivated perspective on hallucination detection: leveraging gradients of attention weights to localize hallucination-prone tokens. To the best of my knowledge, this is the first systematic use of this signal, underscoring the work’s novelty. The two inference-time interventions—SGRS and LocoRE—are tightly coupled to this insight and translate it into practical, training-free improvements with minimal changes to the base model. The experimental study is thorough
My main concern is that the claimed “hallucination pattern” is supported largely by a few curated cases (Figs. 1–2), which risks selection bias. To make the claim compelling, the paper should provide population-level, statistically significant evidence to substantiate this claim.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMind wandering and attention · Hallucinations in medical conditions · Adversarial Robustness in Machine Learning
