MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM

Bowen Dong; Minheng Ni; Zitong Huang; Guanglei Yang; Wangmeng Zuo; Lei Zhang

arXiv:2505.24238·cs.CV·June 3, 2025

MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM

Bowen Dong, Minheng Ni, Zitong Huang, Guanglei Yang, Wangmeng Zuo, Lei Zhang

PDF

Open Access

TL;DR

This paper introduces MIRAGE, a benchmark for evaluating hallucinations in multimodal reasoning of large language models, and proposes methods to reduce logical hallucinations by improving reasoning accuracy.

Contribution

It presents a novel benchmark { extdataset} that isolates reasoning hallucinations and introduces { extmethod}, a method combining curriculum fine-tuning and hint inference to mitigate hallucinations.

Findings

01

Model scale and training stages impact hallucination types.

02

Current MLLMs struggle with spatial reasoning hallucinations.

03

Question types influence hallucination patterns.

Abstract

Multimodal hallucination in multimodal large language models (MLLMs) restricts the correctness of MLLMs. However, multimodal hallucinations are multi-sourced and arise from diverse causes. Existing benchmarks fail to adequately distinguish between perception-induced hallucinations and reasoning-induced hallucinations. This failure constitutes a significant issue and hinders the diagnosis of multimodal reasoning failures within MLLMs. To address this, we propose the {\dataset} benchmark, which isolates reasoning hallucinations by constructing questions where input images are correctly perceived by MLLMs yet reasoning errors persist. {\dataset} introduces multi-granular evaluation metrics: accuracy, factuality, and LLMs hallucination score for hallucination quantification. Our analysis reveals that (1) the model scale, data scale, and training stages significantly affect the degree of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage, Metaphor, and Cognition · Semiotics and Representation Studies · Natural Language Processing Techniques

MethodsBalanced Selection · Hierarchical Information Threading