Paying More Attention to Image: A Training-Free Method for Alleviating   Hallucination in LVLMs

Shi Liu; Kecheng Zheng; Wei Chen

arXiv:2407.21771·cs.CV·August 1, 2024·1 cites

Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs

Shi Liu, Kecheng Zheng, Wei Chen

PDF

Open Access 1 Repo

TL;DR

This paper proposes a training-free method that adjusts attention weights and logits to improve image focus in LVLMs, significantly reducing hallucinations and enhancing multi-modal understanding.

Contribution

It introduces a novel, training-free algorithm that balances image and text influence in LVLMs, addressing hallucination issues without additional training.

Findings

01

Reduces hallucination frequency in various LVLMs

02

Improves alignment between visual input and language output

03

Enhances multi-modal comprehension without extra training

Abstract

Existing Large Vision-Language Models (LVLMs) primarily align image features of vision encoder with Large Language Models (LLMs) to leverage their superior text generation capabilities. However, the scale disparity between vision encoder and language model may led to LLMs assuming a predominant role in multi-modal comprehension. This imbalance in LVLMs may result in the instances of hallucinatory. Concretely, LVLMs may generate consistent descriptions with or without visual input, indicating that certain outputs are influenced solely by context text. We refer to this phenomenon as "text inertia." To counteract this issue, we introduce a training-free algorithm to find an equilibrium point between image comprehension and language inference. Specifically, we adaptively involve adjusting and amplifying the attention weights assigned to image tokens, thereby granting greater prominence to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hasanar1f/llava-hallunication-fix
jax

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHallucinations in medical conditions

MethodsSoftmax · Attention Is All You Need · ALIGN