CheXTemporal: A Dataset for Temporally-Grounded Reasoning in Chest Radiography

Eva Prakash; Yunhe Gao; Chong Wang; Justin Xu; Neal Prakash; Arne Michalson; Seena Dehkharghani; Eun Kyoung Hong; Julie Bauml; Roger Boodoo; Jean-Benoit Delbrouck; Sophie Ostmeier; Curtis Langlotz

arXiv:2605.11304·cs.CV·May 13, 2026

CheXTemporal: A Dataset for Temporally-Grounded Reasoning in Chest Radiography

Eva Prakash, Yunhe Gao, Chong Wang, Justin Xu, Neal Prakash, Arne Michalson, Seena Dehkharghani, Eun Kyoung Hong, Julie Bauml, Roger Boodoo, Jean-Benoit Delbrouck, Sophie Ostmeier, Curtis Langlotz

PDF

TL;DR

CheXTemporal introduces a new dataset with detailed temporal and spatial annotations for chest X-ray analysis, enabling improved reasoning about disease progression over time.

Contribution

The paper presents CheXTemporal, a comprehensive dataset with annotations for temporal and spatial reasoning in chest radiographs, and evaluates current models' limitations in this context.

Findings

01

Current models struggle with spatial grounding and fine-grained temporal reasoning.

02

Models perform better on prominent progression categories like 'worse' than on subtle states like 'stable'.

03

Models show limited robustness under distribution shifts.

Abstract

Chest radiograph interpretation requires temporal reasoning over prior and current studies, yet most vision-language models are trained on static image-report pairs and lack explicit supervision for modeling longitudinal change. We introduce CheXTemporal, a dataset for temporally grounded reasoning in chest radiography consisting of paired prior-current chest X-rays (CXR) with finding-level temporal and spatial annotations. The dataset includes a five-class progression taxonomy (new, worse, stable, improved, resolved), localized spatial supervision of pathology, explicit spatial-temporal alignment across paired studies, and multi-source coverage for cross-domain evaluation. We additionally construct a 280K-pair silver dataset with automatically derived temporal and anatomical supervision for large-scale evaluation under weaker supervision. Using these resources, we evaluate multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.