Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Guanyu Zhou, Yibo Yan, Xin Zou, Kun Wang, Aiwei Liu, Xuming Hu

TL;DR
This paper introduces CausalMM, a causal inference framework that reduces modality prior-induced hallucinations in multimodal large language models by applying structural causal modeling and counterfactual reasoning to improve input-output alignment.
Contribution
It presents a novel causal inference approach using structural causal modeling to mitigate biases from modality priors in MLLMs, enhancing their alignment and reducing hallucinations.
Findings
Achieved up to 65.3% score improvement on VLind-Bench indicators.
Improved MME Benchmark scores by 164 points.
Validated effectiveness through extensive experiments.
Abstract
Multimodal Large Language Models (MLLMs) have emerged as a central focus in both industry and academia, but often suffer from biases introduced by visual and language priors, which can lead to multimodal hallucination. These biases arise from the visual encoder and the Large Language Model (LLM) backbone, affecting the attention mechanism responsible for aligning multimodal inputs. Existing decoding-based mitigation methods focus on statistical correlations and overlook the causal relationships between attention mechanisms and model output, limiting their effectiveness in addressing these biases. To tackle this issue, we propose a causal inference framework termed CausalMM that applies structural causal modeling to MLLMs, treating modality priors as a confounder between attention mechanisms and output. Specifically, by employing backdoor adjustment and counterfactual reasoning at both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsText Readability and Simplification · Topic Modeling · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need · Causal inference · Focus
