Dual Causal Inference: Integrating Backdoor Adjustment and Instrumental Variable Learning for Medical VQA

Zibo Xu; Qiang Li; Ke Lu; Jin Wang; Weizhi Nie; Yuting Su

arXiv:2604.20306·cs.CV·April 23, 2026

Dual Causal Inference: Integrating Backdoor Adjustment and Instrumental Variable Learning for Medical VQA

Zibo Xu, Qiang Li, Ke Lu, Jin Wang, Weizhi Nie, Yuting Su

PDF

TL;DR

This paper introduces a novel causal inference framework for medical visual question answering that jointly addresses observable and unobservable confounders to improve robustness and interpretability.

Contribution

It proposes the first unified architecture combining backdoor adjustment and instrumental variable learning for MedVQA, enhancing causal reasoning and out-of-distribution generalization.

Findings

01

Outperforms existing methods on four benchmark datasets.

02

Improves robustness in out-of-distribution scenarios.

03

Enhances interpretability by disentangling causal effects.

Abstract

Medical Visual Question Answering (MedVQA) aims to generate clinically reliable answers conditioned on complex medical images and questions. However, existing methods often overfit to superficial cross-modal correlations, neglecting the intrinsic biases embedded in multimodal medical data. Consequently, models become vulnerable to cross-modal confounding effects, severely hindering their ability to provide trustworthy diagnostic reasoning. To address this limitation, we propose a novel Dual Causal Inference (DCI) framework for MedVQA. To the best of our knowledge, DCI is the first unified architecture that integrates Backdoor Adjustment (BDA) and Instrumental Variable (IV) learning to jointly tackle both observable and unobserved confounders. Specifically, we formulate a Structural Causal Model (SCM) where observable cross-modal biases (e.g., frequent visual and textual co-occurrences)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.