Segmentation-Based Attention Entropy: Detecting and Mitigating Object Hallucinations in Large Vision-Language Models

Jiale Song; Jiaxin Luo; Xue-song Tang; Kuangrong Hao; Mingbo Zhao

arXiv:2603.16558·cs.CV·March 18, 2026

Segmentation-Based Attention Entropy: Detecting and Mitigating Object Hallucinations in Large Vision-Language Models

Jiale Song, Jiaxin Luo, Xue-song Tang, Kuangrong Hao, Mingbo Zhao

PDF

Open Access

TL;DR

This paper introduces Segmentation-based Attention Entropy (SAE), a novel method leveraging semantic segmentation to detect and reduce object hallucinations in large vision-language models, improving their reliability without extra training.

Contribution

The paper presents SAE, a new approach that quantifies visual attention uncertainty and guides attention adjustment to mitigate hallucinations in LVLMs, a novel contribution in this domain.

Findings

01

SAE significantly reduces object hallucinations in LVLMs.

02

The method operates without additional training costs.

03

Effective in real-world robotic scenarios.

Abstract

Large Vision-Language Models (LVLMs) achieve strong performance on many multimodal tasks, but object hallucinations severely undermine their reliability. Most existing studies focus on the text modality, attributing hallucinations to overly strong language priors and insufficient visual grounding. In contrast, we observe that abnormal attention patterns within the visual modality can also give rise to hallucinated objects. Building on this observation, we propose Segmentation-based Attention Entropy (SAE), which leverages semantic segmentation to quantify visual attention uncertainty in an object-level semantic space. Based on SAE, we further design a reliability score for hallucination detection and an SAE-guided attention adjustment method that modifies visual attention at inference time to mitigate hallucinations. We evaluate our approach on public benchmarks and in real embodied…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Visual Attention and Saliency Detection