Dual Causal Inference: Integrating Backdoor Adjustment and Instrumental Variable Learning for Medical VQA
Zibo Xu, Qiang Li, Ke Lu, Jin Wang, Weizhi Nie, Yuting Su

TL;DR
This paper introduces a novel causal inference framework for medical visual question answering that jointly addresses observable and unobservable confounders to improve robustness and interpretability.
Contribution
It proposes the first unified architecture combining backdoor adjustment and instrumental variable learning for MedVQA, enhancing causal reasoning and out-of-distribution generalization.
Findings
Outperforms existing methods on four benchmark datasets.
Improves robustness in out-of-distribution scenarios.
Enhances interpretability by disentangling causal effects.
Abstract
Medical Visual Question Answering (MedVQA) aims to generate clinically reliable answers conditioned on complex medical images and questions. However, existing methods often overfit to superficial cross-modal correlations, neglecting the intrinsic biases embedded in multimodal medical data. Consequently, models become vulnerable to cross-modal confounding effects, severely hindering their ability to provide trustworthy diagnostic reasoning. To address this limitation, we propose a novel Dual Causal Inference (DCI) framework for MedVQA. To the best of our knowledge, DCI is the first unified architecture that integrates Backdoor Adjustment (BDA) and Instrumental Variable (IV) learning to jointly tackle both observable and unobserved confounders. Specifically, we formulate a Structural Causal Model (SCM) where observable cross-modal biases (e.g., frequent visual and textual co-occurrences)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
