RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology
Wenxuan Li, Pedro R. A. S. Bassi, Xinze Zhou, Jakob Wasserthal, Alan L. Yuille, Zongwei Zhou

TL;DR
RadThinking is a comprehensive VQA dataset designed to evaluate and train AI systems in complex clinical reasoning tasks for cancer screening, grounded in real-world standards and spanning multiple difficulty levels.
Contribution
It introduces the first cancer-screening VQA dataset with stratified reasoning levels, grounded in clinical standards, and supports training for multi-step clinical reasoning.
Findings
Includes 20,362 CT scans from over 9,000 patients.
Provides chain-of-thought data for multi-step reasoning.
Enables evaluation of AI reasoning beyond detection.
Abstract
Cancer screening is a reasoning task. A radiologist observes findings, compares them to prior scans, integrates clinical context, and reaches a diagnostic conclusion confirmed by pathology. We present RadThinking, a Visual Question Answering (VQA) dataset that makes this reasoning explicit and trainable. RadThinking releases VQA pairs at three difficulty tiers. Foundation VQAs are atomic perception questions. Single-step reasoning VQAs apply one clinical rule. Compositional VQAs require multi-step chain-of-thought to reach a guideline category such as LI-RADS-5. For every compositional VQA, we release the chain of foundation VQAs that solves it. The chain follows the rules of the governing clinical reporting standard. The dataset spans 20,362 CT scans from 9,131 patients across 43 cancer groups, plus 2,077 verified healthy controls with >1-year follow-up. To our knowledge, RadThinking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
