RadGraph: Extracting Clinical Entities and Relations from Radiology Reports
Saahil Jain, Ashwin Agrawal, Adriel Saporta, Steven QH Truong, Du, Nguyen Duong, Tan Bui, Pierre Chambon, Yuhao Zhang, Matthew P. Lungren,, Andrew Y. Ng, Curtis P. Langlotz, Pranav Rajpurkar

TL;DR
RadGraph is a new dataset and model for extracting structured clinical entities and relations from radiology reports, enabling advanced healthcare applications and research in medical NLP and multimodal learning.
Contribution
The paper introduces RadGraph, a comprehensive dataset with annotations for entities and relations in radiology reports, and a deep learning benchmark achieving high relation extraction accuracy.
Findings
RadGraph dataset contains over 14,000 entities and 10,000 relations in 500 reports.
RadGraph Benchmark achieves a micro F1 score of 0.82 on relation extraction.
The dataset and model facilitate research in medical NLP and multimodal learning.
Abstract
Extracting structured clinical information from free-text radiology reports can enable the use of radiology report information for a variety of critical healthcare applications. In our work, we present RadGraph, a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema we designed to structure radiology reports. We release a development dataset, which contains board-certified radiologist annotations for 500 radiology reports from the MIMIC-CXR dataset (14,579 entities and 10,889 relations), and a test dataset, which contains two independent sets of board-certified radiologist annotations for 100 radiology reports split equally across the MIMIC-CXR and CheXpert datasets. Using these datasets, we train and test a deep learning model, RadGraph Benchmark, that achieves a micro F1 of 0.82 and 0.73 on relation extraction on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗google/medgemma-1.5-4b-itmodel· 86k dl· ♡ 53686k dl♡ 536
- 🤗google/medgemma-4b-itmodel· 170k dl· ♡ 925170k dl♡ 925
- 🤗unsloth/medgemma-27b-it-GGUFmodel· 4.4k dl· ♡ 384.4k dl♡ 38
- 🤗google/medgemma-4b-ptmodel· 1.1k dl· ♡ 1481.1k dl♡ 148
- 🤗google/medgemma-27b-itmodel· 107k dl· ♡ 330107k dl♡ 330
- 🤗pszemraj/medgemma-4b-it-hereticmodel· 46 dl· ♡ 546 dl♡ 5
- 🤗unsloth/medgemma-1.5-4b-it-GGUFmodel· 6.7k dl· ♡ 336.7k dl♡ 33
- 🤗unsloth/medgemma-4b-itmodel· 1.1k dl· ♡ 71.1k dl♡ 7
- 🤗unsloth/medgemma-4b-it-GGUFmodel· 11k dl· ♡ 6311k dl♡ 63
- 🤗unsloth/medgemma-4b-it-unsloth-bnb-4bitmodel· 2.1k dl· ♡ 12.1k dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
