ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports

Mohammed Baharoon; Luyang Luo; Michael Moritz; Abhinav Kumar; Sung Eun Kim; Xiaoman Zhang; Miao Zhu; Mahmoud Hussain Alabbad; Maha Sbayel Alhazmi; Neel P. Mistry; Lucas Bijnens; Kent Ryan Kleinschmidt; Brady Chrisler; Sathvik Suryadevara; Sri Sai Dinesh Jaliparthi; Noah Michael Prudlo; Mark David Marino; Jeremy Palacio; Rithvik Akula; Di Zhou; Hong-Yu Zhou; Ibrahim Ethem Hamamci; Scott J. Adams; Hassan Rayhan AlOmaish; Pranav Rajpurkar

arXiv:2507.22030·eess.IV·October 28, 2025

ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports

Mohammed Baharoon, Luyang Luo, Michael Moritz, Abhinav Kumar, Sung Eun Kim, Xiaoman Zhang, Miao Zhu, Mahmoud Hussain Alabbad, Maha Sbayel Alhazmi, Neel P. Mistry, Lucas Bijnens, Kent Ryan Kleinschmidt, Brady Chrisler, Sathvik Suryadevara, Sri Sai Dinesh Jaliparthi

PDF

1 Models 2 Datasets

TL;DR

ReXGroundingCT is a pioneering dataset linking free-text radiology findings to detailed 3D segmentations in chest CT scans, enabling advanced research in automated report generation and image analysis.

Contribution

The paper introduces the first large-scale, publicly available dataset that connects free-text radiology reports with pixel-level 3D segmentations in chest CT scans, utilizing GPT-4 for data extraction and annotation.

Findings

01

Contains 16,301 annotated entities across 8,028 pairs.

02

Includes a hierarchical anatomical reasoning dataset.

03

Provides a public leaderboard for model benchmarking.

Abstract

We introduce ReXGroundingCT, the first publicly available dataset linking free-text findings to pixel-level 3D segmentations in chest CT scans. The dataset includes 3,142 non-contrast chest CT scans paired with standardized radiology reports from CT-RATE. Construction followed a structured three-stage pipeline. First, GPT-4 was used to extract and standardize findings, descriptors, and metadata from reports originally written in Turkish and machine-translated into English. Second, GPT-4o-mini categorized each finding into a hierarchical ontology of lung and pleural abnormalities. Third, 3D annotations were produced for all CT volumes: the training set was quality-assured by board-certified radiologists, and the validation and test sets were fully annotated by board-certified radiologists. Additionally, a complementary chain-of-thought dataset was created to provide step-by-step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
SutanRifkyt/K2-Inhale
model· 2 dl· ♡ 1
2 dl♡ 1

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.