Structure Causal Models and LLMs Integration in Medical Visual Question Answering
Zibo Xu, Qiang Li, Weizhi Nie, Weijie Wang, Anan Liu

TL;DR
This paper introduces a causal inference framework for Medical Visual Question Answering that reduces confounding biases between images and questions, improving accuracy and causal interpretability in complex medical data.
Contribution
It presents a novel causal graph structure, a multi-variable resampling front-door adjustment method, and a prompt strategy to enhance MedVQA performance and causal understanding.
Findings
Significant accuracy improvements on three MedVQA datasets
Effective elimination of confounding effects in medical data
Enhanced model understanding of causal relationships
Abstract
Medical Visual Question Answering (MedVQA) aims to answer medical questions according to medical images. However, the complexity of medical data leads to confounders that are difficult to observe, so bias between images and questions is inevitable. Such cross-modal bias makes it challenging to infer medically meaningful answers. In this work, we propose a causal inference framework for the MedVQA task, which effectively eliminates the relative confounding effect between the image and the question to ensure the precision of the question-answering (QA) session. We are the first to introduce a novel causal graph structure that represents the interaction between visual and textual elements, explicitly capturing how different questions influence visual features. During optimization, we apply the mutual information to discover spurious correlations and propose a multi-variable resampling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsCausal inference · ALIGN
