MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM
Bowen Dong, Minheng Ni, Zitong Huang, Guanglei Yang, Wangmeng Zuo, Lei Zhang

TL;DR
This paper introduces MIRAGE, a benchmark for evaluating hallucinations in multimodal reasoning of large language models, and proposes methods to reduce logical hallucinations by improving reasoning accuracy.
Contribution
It presents a novel benchmark { extdataset} that isolates reasoning hallucinations and introduces { extmethod}, a method combining curriculum fine-tuning and hint inference to mitigate hallucinations.
Findings
Model scale and training stages impact hallucination types.
Current MLLMs struggle with spatial reasoning hallucinations.
Question types influence hallucination patterns.
Abstract
Multimodal hallucination in multimodal large language models (MLLMs) restricts the correctness of MLLMs. However, multimodal hallucinations are multi-sourced and arise from diverse causes. Existing benchmarks fail to adequately distinguish between perception-induced hallucinations and reasoning-induced hallucinations. This failure constitutes a significant issue and hinders the diagnosis of multimodal reasoning failures within MLLMs. To address this, we propose the {\dataset} benchmark, which isolates reasoning hallucinations by constructing questions where input images are correctly perceived by MLLMs yet reasoning errors persist. {\dataset} introduces multi-granular evaluation metrics: accuracy, factuality, and LLMs hallucination score for hallucination quantification. Our analysis reveals that (1) the model scale, data scale, and training stages significantly affect the degree of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Metaphor, and Cognition · Semiotics and Representation Studies · Natural Language Processing Techniques
MethodsBalanced Selection · Hierarchical Information Threading
