Quantifying and Mitigating Unimodal Biases in Multimodal Large Language   Models: A Causal Perspective

Meiqi Chen; Yixin Cao; Yan Zhang; and Chaochao Lu

arXiv:2403.18346·cs.CL·November 14, 2024·1 cites

Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective

Meiqi Chen, Yixin Cao, Yan Zhang, and Chaochao Lu

PDF

Open Access 1 Repo

TL;DR

This paper identifies unimodal biases in multimodal large language models, introduces a new challenging dataset and a causality-based framework to mitigate these biases, leading to improved reasoning and understanding in multimodal tasks.

Contribution

It presents a causal analysis framework, a new dataset, and a bias-mitigating agent framework to enhance the robustness of multimodal large language models.

Findings

01

MLLMs perform poorly on the MORE dataset due to unimodal biases.

02

The CAVE framework improves reasoning and reduces biases in MLLMs.

03

Insights contribute to developing more robust multimodal AI systems.

Abstract

Recent advancements in Large Language Models (LLMs) have facilitated the development of Multimodal LLMs (MLLMs). Despite their impressive capabilities, MLLMs often suffer from over-reliance on unimodal biases (e.g., language bias and vision bias), leading to incorrect answers or hallucinations in complex multimodal tasks. To investigate this issue, we propose a causal framework to interpret the biases in Visual Question Answering (VQA) problems. Within this framework, we conduct an in-depth causal analysis to assess the causal effect of these biases on MLLM predictions. Based on the analysis, we introduce 1) a novel MORE dataset with 12,000 challenging VQA instances requiring multi-hop reasoning and overcoming unimodal biases. 2) a causality-enhanced agent framework CAVE that guides models to comprehensively integrate information from different modalities and mitigate biases. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

opencausalab/more
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling