Revealing Multi-View Hallucination in Large Vision-Language Models

Wooje Park; Insu Lee; Soohyun Kim; Jaeyun Jang; Minyoung Noh; Kyuhong Shim; Byonghyo Shim

arXiv:2603.23934·cs.CV·March 26, 2026

Revealing Multi-View Hallucination in Large Vision-Language Models

Wooje Park, Insu Lee, Soohyun Kim, Jaeyun Jang, Minyoung Noh, Kyuhong Shim, Byonghyo Shim

PDF

Open Access

TL;DR

This paper identifies and analyzes multi-view hallucination in large vision-language models, introduces a benchmark for evaluation, and proposes a training-free decoding method that significantly reduces hallucination effects.

Contribution

It systematically characterizes multi-view hallucination, creates MVH-Bench for evaluation, and introduces RSCD, a novel decoding technique to mitigate hallucinations without additional training.

Findings

01

LVLMs often confuse visual information from different instances or viewpoints.

02

RSCD improves model performance by up to 34.6 points on the MVH-Bench.

03

The benchmark reveals significant challenges in current LVLMs' multi-view understanding.

Abstract

Large vision-language models (LVLMs) are increasingly being applied to multi-view image inputs captured from diverse viewpoints. However, despite this growing use, current LVLMs often confuse or mismatch visual information originating from different instances or viewpoints, a phenomenon we term multi-view hallucination. To systematically analyze this problem, we construct MVH-Bench, a benchmark comprising 4.8k question-answer pairs targeting two types of hallucination: cross-instance and cross-view. Empirical results show that recent LVLMs struggle to correctly associate visual evidence with its corresponding instance or viewpoint. To overcome this limitation, we propose Reference Shift Contrastive Decoding (RSCD), a training-free decoding technique that suppresses visual interference by generating negative logits through attention masking. Experiments on MVH-Bench with Qwen2.5-VL and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Image Processing Techniques · Multimodal Machine Learning Applications