Causal-LLaVA: Causal Disentanglement for Mitigating Hallucination in Multimodal Large Language Models
Xinmiao Hu, Chun Wang, Ruihe An, ChenYu Shao, Xiaojun Ye, Sheng Zhou, Liangcheng Li (Zhejiang University)

TL;DR
Causal-LLaVA introduces a causality-based framework with intervention modules to reduce object hallucinations in multimodal large language models, improving factual accuracy without sacrificing performance.
Contribution
It presents a novel causality-driven disentanglement approach with specific modules to mitigate hallucinations caused by dataset biases in MLLMs.
Findings
Significantly reduces object hallucinations in MLLMs.
Maintains high performance on multimodal benchmarks.
Visualization confirms better object representation separability.
Abstract
Multimodal Large Language Models (MLLMs) have demonstrated strong performance in visual understanding tasks, yet they often suffer from object hallucinations--generating descriptions of objects that are inconsistent with or entirely absent from the input. This issue is closely related to dataset biases, where frequent co-occurrences of objects lead to entangled semantic representations across modalities. As a result, models may erroneously activate object representations that are commonly associated with the input but not actually present. To address this, we propose a causality-driven disentanglement framework that mitigates hallucinations through causal intervention. Our approach includes a Causal-Driven Projector in the visual pathway and a Causal Intervention Module integrated into the final transformer layer of the language model. These components work together to reduce spurious…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Computational and Text Analysis Methods · Machine Learning in Healthcare
