ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding

Ankit Pal; Jung-Oh Lee; Xiaoman Zhang; Malaikannan Sankarasubbu; Seunghyeon Roh; Won Jung Kim; Meesun Lee; and Pranav Rajpurkar

arXiv:2506.04353·cs.CV·June 6, 2025

ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding

Ankit Pal, Jung-Oh Lee, Xiaoman Zhang, Malaikannan Sankarasubbu, Seunghyeon Roh, Won Jung Kim, Meesun Lee, and Pranav Rajpurkar

PDF

Open Access

TL;DR

ReXVQA is a comprehensive, large-scale benchmark for chest X-ray visual question answering, enabling evaluation of AI models' clinical reasoning and surpassing radiologists in accuracy.

Contribution

It introduces the largest diverse VQA dataset for chest X-rays with authentic clinical questions and evaluates state-of-the-art models, setting new standards for AI in radiology.

Findings

01

AI models achieved over 83% accuracy, surpassing radiologists' 77%.

02

ReXVQA enables detailed evaluation of AI reasoning skills.

03

Benchmark includes public leaderboards and structured explanations.

Abstract

We present ReXVQA, the largest and most comprehensive benchmark for visual question answering (VQA) in chest radiology, comprising approximately 696,000 questions paired with 160,000 chest X-rays studies across training, validation, and test sets. Unlike prior efforts that rely heavily on template based queries, ReXVQA introduces a diverse and clinically authentic task suite reflecting five core radiological reasoning skills: presence assessment, location analysis, negation detection, differential diagnosis, and geometric reasoning. We evaluate eight state-of-the-art multimodal large language models, including MedGemma-4B-it, Qwen2.5-VL, Janus-Pro-7B, and Eagle2-9B. The best-performing model (MedGemma) achieves 83.24% overall accuracy. To bridge the gap between AI performance and clinical expertise, we conducted a comprehensive human reader study involving 3 radiology residents on 200…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning