Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports

Francesco Dalla Serra; Patrick Schrempf; Chaoyang Wang; Zaiqiao Meng; Fani Deligianni; and Alison Q. O'Neil

arXiv:2505.16624·cs.CV·May 23, 2025

Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports

Francesco Dalla Serra, Patrick Schrempf, Chaoyang Wang, Zaiqiao Meng, Fani Deligianni, and Alison Q. O'Neil

PDF

Open Access

TL;DR

This paper introduces a unified approach to Chest X-ray VQA that leverages radiology reports to improve answer accuracy for both single-image and image-difference questions, achieving state-of-the-art results.

Contribution

It extends prior work by integrating radiology report generation into the VQA process, enhancing model performance for temporal change detection and abnormality identification.

Findings

01

Incorporating radiology reports improves VQA accuracy.

02

The model effectively handles both single-image and image-difference questions.

03

Achieved state-of-the-art results on Medical-Diff-VQA dataset.

Abstract

We present a novel approach to Chest X-ray (CXR) Visual Question Answering (VQA), addressing both single-image image-difference questions. Single-image questions focus on abnormalities within a specific CXR ("What abnormalities are seen in image X?"), while image-difference questions compare two longitudinal CXRs acquired at different time points ("What are the differences between image X and Y?"). We further explore how the integration of radiology reports can enhance the performance of VQA models. While previous approaches have demonstrated the utility of radiology reports during the pre-training phase, we extend this idea by showing that the reports can also be leveraged as additional input to improve the VQA model's predicted answers. First, we propose a unified method that handles both types of questions and auto-regressively generates the answers. For single-image questions, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsFocus