PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation

Daniel C. Castro; Aurelia Bustos; Shruthi Bannur; Stephanie L. Hyland; Kenza Bouzid; Maria Teodora Wetscherek; Maria Dolores S\'anchez-Valverde; Lara Jaques-P\'erez; Lourdes P\'erez-Rodr\'iguez; Kenji Takeda; Jos\'e Mar\'ia Salinas; Javier Alvarez-Valle; Joaqu\'in Galant Herrero; Antonio Pertusa

arXiv:2411.05085·cs.AI·September 4, 2025

PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation

Daniel C. Castro, Aurelia Bustos, Shruthi Bannur, Stephanie L. Hyland, Kenza Bouzid, Maria Teodora Wetscherek, Maria Dolores S\'anchez-Valverde, Lara Jaques-P\'erez, Lourdes P\'erez-Rodr\'iguez, Kenji Takeda, Jos\'e Mar\'ia Salinas, Javier Alvarez-Valle, Joaqu\'in Galant Herrero

PDF

TL;DR

PadChest-GR is a novel bilingual dataset with detailed annotations for chest X-ray images, designed to facilitate grounded radiology report generation models that localize findings and generate descriptive reports.

Contribution

This work introduces the first manually curated dataset for grounded radiology report generation in chest X-rays, including bilingual annotations and localization data.

Findings

01

Provides 4,555 annotated CXR studies with bilingual reports.

02

Includes detailed localization with bounding boxes for findings.

03

Enables training and evaluation of grounded report generation models.

Abstract

Radiology report generation (RRG) aims to create free-text radiology reports from clinical imaging. Grounded radiology report generation (GRRG) extends RRG by including the localisation of individual findings on the image. Currently, there are no manually annotated chest X-ray (CXR) datasets to train GRRG models. In this work, we present a dataset called PadChest-GR (Grounded-Reporting) derived from PadChest aimed at training GRRG models for CXR images. We curate a public bi-lingual dataset of 4,555 CXR studies with grounded reports (3,099 abnormal and 1,456 normal), each containing complete lists of sentences describing individual present (positive) and absent (negative) findings in English and Spanish. In total, PadChest-GR contains 7,037 positive and 3,422 negative finding sentences. Every positive finding sentence is associated with up to two independent sets of bounding boxes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.