Mitigating Modality Prior-Induced Hallucinations in Multimodal Large   Language Models via Deciphering Attention Causality

Guanyu Zhou; Yibo Yan; Xin Zou; Kun Wang; Aiwei Liu; Xuming Hu

arXiv:2410.04780·cs.CV·February 19, 2025

Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality

Guanyu Zhou, Yibo Yan, Xin Zou, Kun Wang, Aiwei Liu, Xuming Hu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CausalMM, a causal inference framework that reduces modality prior-induced hallucinations in multimodal large language models by applying structural causal modeling and counterfactual reasoning to improve input-output alignment.

Contribution

It presents a novel causal inference approach using structural causal modeling to mitigate biases from modality priors in MLLMs, enhancing their alignment and reducing hallucinations.

Findings

01

Achieved up to 65.3% score improvement on VLind-Bench indicators.

02

Improved MME Benchmark scores by 164 points.

03

Validated effectiveness through extensive experiments.

Abstract

Multimodal Large Language Models (MLLMs) have emerged as a central focus in both industry and academia, but often suffer from biases introduced by visual and language priors, which can lead to multimodal hallucination. These biases arise from the visual encoder and the Large Language Model (LLM) backbone, affecting the attention mechanism responsible for aligning multimodal inputs. Existing decoding-based mitigation methods focus on statistical correlations and overlook the causal relationships between attention mechanisms and model output, limiting their effectiveness in addressing these biases. To tackle this issue, we propose a causal inference framework termed CausalMM that applies structural causal modeling to MLLMs, treating modality priors as a confounder between attention mechanisms and output. Specifically, by employing backdoor adjustment and counterfactual reasoning at both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

the-martyr/causalmm
pytorchOfficial

Videos

Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality· slideslive

Taxonomy

TopicsText Readability and Simplification · Topic Modeling · Natural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need · Causal inference · Focus