MedCFVQA: A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering

Shuchang Ye; Usman Naseem; Mingyuan Meng; Dagan Feng; Jinman Kim

arXiv:2505.16209·cs.CV·May 26, 2025

MedCFVQA: A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering

Shuchang Ye, Usman Naseem, Mingyuan Meng, Dagan Feng, Jinman Kim

PDF

TL;DR

This paper introduces MedCFVQA, a causal approach that mitigates modality preference bias in medical visual question answering, leading to improved performance on multiple datasets by reconstructing datasets and leveraging causal graphs.

Contribution

The paper proposes a novel causal MedVQA model that reduces modality bias and reconstructs datasets to better evaluate true multimodal understanding.

Findings

01

MedCFVQA outperforms non-causal models on multiple datasets.

02

Reconstructed datasets reduce prior dependency issues.

03

Causal approach effectively mitigates modality bias.

Abstract

Medical Visual Question Answering (MedVQA) is crucial for enhancing the efficiency of clinical diagnosis by providing accurate and timely responses to clinicians' inquiries regarding medical images. Existing MedVQA models suffered from modality preference bias, where predictions are heavily dominated by one modality while overlooking the other (in MedVQA, usually questions dominate the answer but images are overlooked), thereby failing to learn multimodal knowledge. To overcome the modality preference bias, we proposed a Medical CounterFactual VQA (MedCFVQA) model, which trains with bias and leverages causal graphs to eliminate the modality preference bias during inference. Existing MedVQA datasets exhibit substantial prior dependencies between questions and answers, which results in acceptable performance even if the model significantly suffers from the modality preference bias. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.