Attention to details, logits to truth: visual-aware attention and logits enhancement to mitigate hallucinations in LVLMs

Jingyi Wang; Fei Li; Rujie Liu

arXiv:2602.09521·cs.CV·February 11, 2026

Attention to details, logits to truth: visual-aware attention and logits enhancement to mitigate hallucinations in LVLMs

Jingyi Wang, Fei Li, Rujie Liu

PDF

Open Access

TL;DR

This paper introduces a training-free attentional intervention method that improves visual attention in LVLMs, significantly reducing hallucinations while maintaining output quality.

Contribution

It proposes a novel, training-free algorithm that reweights visual attention based on cross-attention similarities and enhances visual token contribution during decoding.

Findings

01

Reduces hallucinations in mainstream LVLMs

02

Maintains accuracy and coherence of generated content

03

Effective across multiple LVLM architectures

Abstract

Existing Large Vision-Language Models (LVLMs) exhibit insufficient visual attention, leading to hallucinations. To alleviate this problem, some previous studies adjust and amplify visual attention. These methods present a limitation that boosting attention for all visual tokens inevitably increases attention to task irrelevant tokens. To tackle this challenge, we propose a training free attentional intervention algorithm to enhance the attention of task-relevant tokens based on the argument that task-relevant tokens generally demonstrate high visual-textual similarities. Specifically, the vision-text cross-attention submatrices, which represent visual-textual correlations, are extracted to construct the reweighting matrices to reallocate attention. Besides, to enhance the contribution of visual tokens, we inject visual attention values into the beam search decoding to identify solutions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Hallucinations in medical conditions · Adversarial Robustness in Machine Learning