Investigating and Mitigating the Multimodal Hallucination Snowballing in   Large Vision-Language Models

Weihong Zhong; Xiaocheng Feng; Liang Zhao; Qiming Li; Lei Huang,; Yuxuan Gu; Weitao Ma; Yuan Xu; Bing Qin

arXiv:2407.00569·cs.CV·August 6, 2024·1 cites

Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models

Weihong Zhong, Xiaocheng Feng, Liang Zhao, Qiming Li, Lei Huang,, Yuxuan Gu, Weitao Ma, Yuan Xu, Bing Qin

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how multimodal hallucinations in large vision-language models can snowball through interactions, leading to false claims, and proposes a training-free mitigation method called Residual Visual Decoding.

Contribution

It introduces MMHalSnowball, a framework to evaluate hallucination snowballing in LVLMs, and proposes a novel mitigation method that reduces hallucination effects without retraining.

Findings

01

LVLM performance drops by at least 31% due to hallucinations

02

The proposed method mitigates over 24% of hallucinations

03

LVLMs are prone to accepting and propagating generated hallucinations

Abstract

Though advanced in understanding visual information with human languages, Large Vision-Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is that during multimodal interaction, the generated hallucinations could influence the LVLMs' subsequent generation. Thus, we raise a question: When presented with a query relevant to the previously generated hallucination, will LVLMs be misled and respond incorrectly, even though the ground visual information exists? To answer this, we propose a framework called MMHalSnowball to evaluate LVLMs' behaviors when encountering generated hallucinations, where LVLMs are required to answer specific visual questions within a curated hallucinatory conversation. Crucially, our experiment shows that the performance of open-source LVLMs drops by at least $31%$ , indicating that LVLMs are prone to accept the generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

whongzhong/MMHalSnowball
pytorchOfficial

Videos

Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models· underline

Taxonomy

TopicsMisinformation and Its Impacts · Data-Driven Disease Surveillance