Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, and Zheng Zhang, Mike Zheng Shou

TL;DR
This survey comprehensively reviews hallucination issues in multimodal large language models, analyzing causes, evaluation methods, mitigation strategies, and outlining future research directions to improve their reliability.
Contribution
It provides a detailed classification of hallucination causes, evaluation benchmarks, and mitigation techniques, offering a valuable resource for advancing MLLM robustness.
Findings
Identification of key causes of hallucination in MLLMs
Evaluation benchmarks and metrics for hallucination detection
Overview of mitigation strategies and their effectiveness
Abstract
This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities in multimodal tasks. Despite these promising developments, MLLMs often generate outputs that are inconsistent with the visual content, a challenge known as hallucination, which poses substantial obstacles to their practical deployment and raises concerns regarding their reliability in real-world applications. This problem has attracted increasing attention, prompting efforts to detect and mitigate such inaccuracies. We review recent advances in identifying, evaluating, and mitigating these hallucinations, offering a detailed overview of the underlying causes, evaluation benchmarks, metrics, and strategies developed to address this issue.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
