Understanding and Mitigating Hallucinations in Multimodal Chain-of-Thought Models

Ji Ma; Wei Suo; Peng Wang; Yanning Zhang

arXiv:2603.27201·cs.CV·March 31, 2026

Understanding and Mitigating Hallucinations in Multimodal Chain-of-Thought Models

Ji Ma, Wei Suo, Peng Wang, Yanning Zhang

PDF

1 Repo

TL;DR

This paper investigates hallucination issues in Multimodal Chain-of-Thought models, identifies their causes, and proposes an effective intervention strategy that improves visual reasoning accuracy.

Contribution

It systematically analyzes hallucination patterns in MCoT models and introduces a simple method to localize and mitigate divergent thinking-induced hallucinations.

Findings

01

The method significantly reduces hallucinations in MCoT models.

02

It outperforms existing hallucination mitigation techniques.

03

The approach can be combined with other methods for further improvements.

Abstract

Multimodal Chain-of-Thought (MCoT) models have demonstrated impressive capability in complex visual reasoning tasks. Unfortunately, recent studies reveal that they suffer from severe hallucination problems due to diminished visual attention during the generation process. However, visual attention decay is a well-studied problem in Large Vision-Language Models (LVLMs). Considering the fundamental differences in reasoning processes between MCoT models and traditional LVLMs, we raise a basic question: Whether MCoT models have unique causes of hallucinations? To answer this question, we systematically investigate the hallucination patterns of MCoT models and find that fabricated texts are primarily generated in associative reasoning steps, which we term divergent thinking. Leveraging these insights, we introduce a simple yet effective strategy that can effectively localize divergent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ASGO-MM/MCoT-hallucination
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.