Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression

Sreetama Sarkar; Yue Che; Alex Gavin; Peter A. Beerel; Souvik Kundu

arXiv:2505.16411·cs.CV·October 17, 2025

Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression

Sreetama Sarkar, Yue Che, Alex Gavin, Peter A. Beerel, Souvik Kundu

PDF

1 Repo 1 Video

TL;DR

This paper introduces SPIN, an inference-time, attention head suppression method that reduces hallucinations in vision-language models without adding latency, by selectively suppressing heads with low image attention.

Contribution

SPIN is a novel, task-agnostic approach that suppresses specific attention heads during inference to mitigate hallucinations in LVLMs efficiently.

Findings

01

Reduces hallucination scores up to 2.7x

02

Maintains F1 performance

03

Increases throughput by 1.8x

Abstract

Despite their remarkable progress in multimodal understanding tasks, large vision language models (LVLMs) often suffer from "hallucinations", generating texts misaligned with the visual context. Existing methods aimed at reducing hallucinations through inference time intervention incur a significant increase in latency. To mitigate this, we present SPIN, a task-agnostic attention-guided head suppression strategy that can be seamlessly integrated during inference, without incurring any significant compute or latency overhead. We investigate whether hallucination in LVLMs can be linked to specific model components. Our analysis suggests that hallucinations can be attributed to a dynamic subset of attention heads in each layer. Leveraging this insight, for each text query token, we selectively suppress attention heads that exhibit low attention to image tokens, keeping the top-K attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yueche77/spin
pytorchOfficial

Videos

Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression· underline

Taxonomy

MethodsSoftmax · Attention Is All You Need