Causal-LLaVA: Causal Disentanglement for Mitigating Hallucination in Multimodal Large Language Models

Xinmiao Hu; Chun Wang; Ruihe An; ChenYu Shao; Xiaojun Ye; Sheng Zhou; Liangcheng Li (Zhejiang University)

arXiv:2505.19474·cs.AI·May 27, 2025

Causal-LLaVA: Causal Disentanglement for Mitigating Hallucination in Multimodal Large Language Models

Xinmiao Hu, Chun Wang, Ruihe An, ChenYu Shao, Xiaojun Ye, Sheng Zhou, Liangcheng Li (Zhejiang University)

PDF

Open Access 1 Repo

TL;DR

Causal-LLaVA introduces a causality-based framework with intervention modules to reduce object hallucinations in multimodal large language models, improving factual accuracy without sacrificing performance.

Contribution

It presents a novel causality-driven disentanglement approach with specific modules to mitigate hallucinations caused by dataset biases in MLLMs.

Findings

01

Significantly reduces object hallucinations in MLLMs.

02

Maintains high performance on multimodal benchmarks.

03

Visualization confirms better object representation separability.

Abstract

Multimodal Large Language Models (MLLMs) have demonstrated strong performance in visual understanding tasks, yet they often suffer from object hallucinations--generating descriptions of objects that are inconsistent with or entirely absent from the input. This issue is closely related to dataset biases, where frequent co-occurrences of objects lead to entangled semantic representations across modalities. As a result, models may erroneously activate object representations that are commonly associated with the input but not actually present. To address this, we propose a causality-driven disentanglement framework that mitigates hallucinations through causal intervention. Our approach includes a Causal-Driven Projector in the visual pathway and a Causal Intervention Module integrated into the final transformer layer of the language model. These components work together to reduce spurious…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ignisavium/causal-llava
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Computational and Text Analysis Methods · Machine Learning in Healthcare