Revealing Multi-View Hallucination in Large Vision-Language Models
Wooje Park, Insu Lee, Soohyun Kim, Jaeyun Jang, Minyoung Noh, Kyuhong Shim, Byonghyo Shim

TL;DR
This paper identifies and analyzes multi-view hallucination in large vision-language models, introduces a benchmark for evaluation, and proposes a training-free decoding method that significantly reduces hallucination effects.
Contribution
It systematically characterizes multi-view hallucination, creates MVH-Bench for evaluation, and introduces RSCD, a novel decoding technique to mitigate hallucinations without additional training.
Findings
LVLMs often confuse visual information from different instances or viewpoints.
RSCD improves model performance by up to 34.6 points on the MVH-Bench.
The benchmark reveals significant challenges in current LVLMs' multi-view understanding.
Abstract
Large vision-language models (LVLMs) are increasingly being applied to multi-view image inputs captured from diverse viewpoints. However, despite this growing use, current LVLMs often confuse or mismatch visual information originating from different instances or viewpoints, a phenomenon we term multi-view hallucination. To systematically analyze this problem, we construct MVH-Bench, a benchmark comprising 4.8k question-answer pairs targeting two types of hallucination: cross-instance and cross-view. Empirical results show that recent LVLMs struggle to correctly associate visual evidence with its corresponding instance or viewpoint. To overcome this limitation, we propose Reference Shift Contrastive Decoding (RSCD), a training-free decoding technique that suppresses visual interference by generating negative logits through attention masking. Experiments on MVH-Bench with Qwen2.5-VL and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Image Processing Techniques · Multimodal Machine Learning Applications
