Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports
Francesco Dalla Serra, Patrick Schrempf, Chaoyang Wang, Zaiqiao Meng, Fani Deligianni, and Alison Q. O'Neil

TL;DR
This paper introduces a unified approach to Chest X-ray VQA that leverages radiology reports to improve answer accuracy for both single-image and image-difference questions, achieving state-of-the-art results.
Contribution
It extends prior work by integrating radiology report generation into the VQA process, enhancing model performance for temporal change detection and abnormality identification.
Findings
Incorporating radiology reports improves VQA accuracy.
The model effectively handles both single-image and image-difference questions.
Achieved state-of-the-art results on Medical-Diff-VQA dataset.
Abstract
We present a novel approach to Chest X-ray (CXR) Visual Question Answering (VQA), addressing both single-image image-difference questions. Single-image questions focus on abnormalities within a specific CXR ("What abnormalities are seen in image X?"), while image-difference questions compare two longitudinal CXRs acquired at different time points ("What are the differences between image X and Y?"). We further explore how the integration of radiology reports can enhance the performance of VQA models. While previous approaches have demonstrated the utility of radiology reports during the pre-training phase, we extend this idea by showing that the reports can also be leveraged as additional input to improve the VQA model's predicted answers. First, we propose a unified method that handles both types of questions and auto-regressively generates the answers. For single-image questions, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
MethodsFocus
