Structure Causal Models and LLMs Integration in Medical Visual Question Answering

Zibo Xu; Qiang Li; Weizhi Nie; Weijie Wang; Anan Liu

arXiv:2505.02703·cs.CV·March 27, 2026

Structure Causal Models and LLMs Integration in Medical Visual Question Answering

Zibo Xu, Qiang Li, Weizhi Nie, Weijie Wang, Anan Liu

PDF

TL;DR

This paper introduces a causal inference framework for Medical Visual Question Answering that reduces confounding biases between images and questions, improving accuracy and causal interpretability in complex medical data.

Contribution

It presents a novel causal graph structure, a multi-variable resampling front-door adjustment method, and a prompt strategy to enhance MedVQA performance and causal understanding.

Findings

01

Significant accuracy improvements on three MedVQA datasets

02

Effective elimination of confounding effects in medical data

03

Enhanced model understanding of causal relationships

Abstract

Medical Visual Question Answering (MedVQA) aims to answer medical questions according to medical images. However, the complexity of medical data leads to confounders that are difficult to observe, so bias between images and questions is inevitable. Such cross-modal bias makes it challenging to infer medically meaningful answers. In this work, we propose a causal inference framework for the MedVQA task, which effectively eliminates the relative confounding effect between the image and the question to ensure the precision of the question-answering (QA) session. We are the first to introduce a novel causal graph structure that represents the interaction between visual and textual elements, explicitly capturing how different questions influence visual features. During optimization, we apply the mutual information to discover spurious correlations and propose a multi-variable resampling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsCausal inference · ALIGN