VinDr-CXR-VQA: A Visual Question Answering Dataset for Explainable Chest X-Ray Analysis with Multi-Task Learning
Dang H. Nguyen, Hieu H. Pham, Hao T. Nguyen, Hieu H. Pham

TL;DR
VinDr-CXR-VQA introduces a comprehensive, annotated chest X-ray dataset for explainable visual question answering, enabling improved clinical interpretability and lesion localization in medical AI models.
Contribution
This work provides the first large-scale, annotated Med-VQA dataset with spatial grounding and a diverse question taxonomy for chest X-ray analysis.
Findings
Benchmark results show improved F1 score of 0.624.
Dataset includes 17,597 QA pairs with radiologist annotations.
Balanced positive and negative samples reduce hallucination issues.
Abstract
We present VinDr-CXR-VQA, a large-scale chest X-ray dataset for explainable Medical Visual Question Answering (Med-VQA) with spatial grounding. The dataset contains 17,597 question-answer pairs across 4,394 images, each annotated with radiologist-verified bounding boxes and clinical reasoning explanations. Our question taxonomy spans six diagnostic types-Where, What, Is there, How many, Which, and Yes/No-capturing diverse clinical intents. To improve reliability, we construct a balanced distribution of 41.7% positive and 58.3% negative samples, mitigating hallucinations in normal cases. Benchmarking with MedGemma-4B-it demonstrates improved performance (F1 = 0.624, +11.8% over baseline) while enabling lesion localization. VinDr-CXR-VQA aims to advance reproducible and clinically grounded Med-VQA research. The dataset and evaluation tools are publicly available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
