MM-THEBench: Do Reasoning MLLMs Think Reasonably?
Zhidian Huang, Zijun Yao, Ji Qi, Shangqing Tu, Junxian Ma, Jinxin Liu, Weichuan Liu, Xiaoyin Che, Lei Hou, Juanzi Li

TL;DR
This paper introduces MM-THEBench, a comprehensive benchmark designed to evaluate hallucinations during reasoning in multimodal large language models, addressing gaps in existing assessments and providing insights into their reasoning capabilities.
Contribution
The paper presents MM-THEBench, a new benchmark with a detailed taxonomy, diverse data, and automated evaluation to measure hallucinations in reasoning MLLMs, filling a critical gap in current evaluation methods.
Findings
Thinking influences hallucination rates in MLLMs.
Existing benchmarks lack focus on internal reasoning processes.
Experiments reveal how reasoning affects model robustness and accuracy.
Abstract
Recent advances in multimodal large language models (MLLMs) mark a shift from non-thinking models to post-trained reasoning models capable of solving complex problems through thinking. However, whether such thinking mitigates hallucinations in multimodal perception and reasoning remains unclear. Self-reflective reasoning enhances robustness but introduces additional hallucinations, and subtle perceptual errors still result in incorrect or coincidentally correct answers. Existing benchmarks primarily focus on models before the emergence of reasoning MLLMs, neglecting the internal thinking process and failing to measure the hallucinations that occur during thinking. To address these challenges, we introduce MM-THEBench, a comprehensive benchmark for assessing hallucinations of intermediate CoTs in reasoning MLLMs. MM-THEBench features a fine-grained taxonomy grounded in cognitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMind wandering and attention · Embodied and Extended Cognition · Neuroscience and Music Perception
