RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology

Wenxuan Li; Pedro R. A. S. Bassi; Xinze Zhou; Jakob Wasserthal; Alan L. Yuille; Zongwei Zhou

arXiv:2605.10761·cs.CV·May 12, 2026

RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology

Wenxuan Li, Pedro R. A. S. Bassi, Xinze Zhou, Jakob Wasserthal, Alan L. Yuille, Zongwei Zhou

PDF

1 Datasets

TL;DR

RadThinking is a comprehensive VQA dataset designed to evaluate and train AI systems in complex clinical reasoning tasks for cancer screening, grounded in real-world standards and spanning multiple difficulty levels.

Contribution

It introduces the first cancer-screening VQA dataset with stratified reasoning levels, grounded in clinical standards, and supports training for multi-step clinical reasoning.

Findings

01

Includes 20,362 CT scans from over 9,000 patients.

02

Provides chain-of-thought data for multi-step reasoning.

03

Enables evaluation of AI reasoning beyond detection.

Abstract

Cancer screening is a reasoning task. A radiologist observes findings, compares them to prior scans, integrates clinical context, and reaches a diagnostic conclusion confirmed by pathology. We present RadThinking, a Visual Question Answering (VQA) dataset that makes this reasoning explicit and trainable. RadThinking releases VQA pairs at three difficulty tiers. Foundation VQAs are atomic perception questions. Single-step reasoning VQAs apply one clinical rule. Compositional VQAs require multi-step chain-of-thought to reach a guideline category such as LI-RADS-5. For every compositional VQA, we release the chain of foundation VQAs that solves it. The chain follows the rules of the governing clinical reporting standard. The dataset spans 20,362 CT scans from 9,131 patients across 43 cancer groups, plus 2,077 verified healthy controls with >1-year follow-up. To our knowledge, RadThinking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

wenxuanchelsea/RadThinking
dataset· 153 dl
153 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.