VinDr-CXR-VQA: A Visual Question Answering Dataset for Explainable Chest X-Ray Analysis with Multi-Task Learning

Dang H. Nguyen; Hieu H. Pham; Hao T. Nguyen; Hieu H. Pham

arXiv:2511.00504·cs.CV·November 11, 2025

VinDr-CXR-VQA: A Visual Question Answering Dataset for Explainable Chest X-Ray Analysis with Multi-Task Learning

Dang H. Nguyen, Hieu H. Pham, Hao T. Nguyen, Hieu H. Pham

PDF

Open Access 1 Datasets

TL;DR

VinDr-CXR-VQA introduces a comprehensive, annotated chest X-ray dataset for explainable visual question answering, enabling improved clinical interpretability and lesion localization in medical AI models.

Contribution

This work provides the first large-scale, annotated Med-VQA dataset with spatial grounding and a diverse question taxonomy for chest X-ray analysis.

Findings

01

Benchmark results show improved F1 score of 0.624.

02

Dataset includes 17,597 QA pairs with radiologist annotations.

03

Balanced positive and negative samples reduce hallucination issues.

Abstract

We present VinDr-CXR-VQA, a large-scale chest X-ray dataset for explainable Medical Visual Question Answering (Med-VQA) with spatial grounding. The dataset contains 17,597 question-answer pairs across 4,394 images, each annotated with radiologist-verified bounding boxes and clinical reasoning explanations. Our question taxonomy spans six diagnostic types-Where, What, Is there, How many, Which, and Yes/No-capturing diverse clinical intents. To improve reliability, we construct a balanced distribution of 41.7% positive and 58.3% negative samples, mitigating hallucinations in normal cases. Benchmarking with MedGemma-4B-it demonstrates improved performance (F1 = 0.624, +11.8% over baseline) while enabling lesion localization. VinDr-CXR-VQA aims to advance reproducible and clinically grounded Med-VQA research. The dataset and evaluation tools are publicly available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

faizan711/VinDR-CXR-VQA
dataset· 36 dl
36 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning