Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective
Meiqi Chen, Yixin Cao, Yan Zhang, and Chaochao Lu

TL;DR
This paper identifies unimodal biases in multimodal large language models, introduces a new challenging dataset and a causality-based framework to mitigate these biases, leading to improved reasoning and understanding in multimodal tasks.
Contribution
It presents a causal analysis framework, a new dataset, and a bias-mitigating agent framework to enhance the robustness of multimodal large language models.
Findings
MLLMs perform poorly on the MORE dataset due to unimodal biases.
The CAVE framework improves reasoning and reduces biases in MLLMs.
Insights contribute to developing more robust multimodal AI systems.
Abstract
Recent advancements in Large Language Models (LLMs) have facilitated the development of Multimodal LLMs (MLLMs). Despite their impressive capabilities, MLLMs often suffer from over-reliance on unimodal biases (e.g., language bias and vision bias), leading to incorrect answers or hallucinations in complex multimodal tasks. To investigate this issue, we propose a causal framework to interpret the biases in Visual Question Answering (VQA) problems. Within this framework, we conduct an in-depth causal analysis to assess the causal effect of these biases on MLLM predictions. Based on the analysis, we introduce 1) a novel MORE dataset with 12,000 challenging VQA instances requiring multi-hop reasoning and overcoming unimodal biases. 2) a causality-enhanced agent framework CAVE that guides models to comprehensively integrate information from different modalities and mitigate biases. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
