MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models

Jiale Li; Mingrui Wu; Zixiang Jin; Hao Chen; Jiayi Ji; Xiaoshuai Sun; Liujuan Cao; Rongrong Ji

arXiv:2508.00726·cs.CV·August 4, 2025

MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models

Jiale Li, Mingrui Wu, Zixiang Jin, Hao Chen, Jiayi Ji, Xiaoshuai Sun, Liujuan Cao, Rongrong Ji

PDF

TL;DR

This paper introduces MIHBench, a benchmark for evaluating multi-image hallucinations in multimodal large language models, and proposes a Dynamic Attention Balancing method to mitigate these hallucinations, improving model reliability.

Contribution

It is the first systematic study of multi-image hallucinations in MLLMs and presents a novel benchmark along with an attention-based mitigation technique.

Findings

01

Multi-image hallucinations increase with more input images.

02

Single-image hallucination tendencies correlate with multi-image hallucinations.

03

The proposed method reduces hallucination occurrences and improves reasoning stability.

Abstract

Despite growing interest in hallucination in Multimodal Large Language Models, existing studies primarily focus on single-image settings, leaving hallucination in multi-image scenarios largely unexplored. To address this gap, we conduct the first systematic study of hallucinations in multi-image MLLMs and propose MIHBench, a benchmark specifically tailored for evaluating object-related hallucinations across multiple images. MIHBench comprises three core tasks: Multi-Image Object Existence Hallucination, Multi-Image Object Count Hallucination, and Object Identity Consistency Hallucination, targeting semantic understanding across object existence, quantity reasoning, and cross-view identity consistency. Through extensive evaluation, we identify key factors associated with the occurrence of multi-image hallucinations, including: a progressive relationship between the number of image inputs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.