RadGraph: Extracting Clinical Entities and Relations from Radiology   Reports

Saahil Jain; Ashwin Agrawal; Adriel Saporta; Steven QH Truong; Du; Nguyen Duong; Tan Bui; Pierre Chambon; Yuhao Zhang; Matthew P. Lungren,; Andrew Y. Ng; Curtis P. Langlotz; Pranav Rajpurkar

arXiv:2106.14463·cs.CL·August 31, 2021·67 cites

RadGraph: Extracting Clinical Entities and Relations from Radiology Reports

Saahil Jain, Ashwin Agrawal, Adriel Saporta, Steven QH Truong, Du, Nguyen Duong, Tan Bui, Pierre Chambon, Yuhao Zhang, Matthew P. Lungren,, Andrew Y. Ng, Curtis P. Langlotz, Pranav Rajpurkar

PDF

Open Access 1 Repo 10 Models

TL;DR

RadGraph is a new dataset and model for extracting structured clinical entities and relations from radiology reports, enabling advanced healthcare applications and research in medical NLP and multimodal learning.

Contribution

The paper introduces RadGraph, a comprehensive dataset with annotations for entities and relations in radiology reports, and a deep learning benchmark achieving high relation extraction accuracy.

Findings

01

RadGraph dataset contains over 14,000 entities and 10,000 relations in 500 reports.

02

RadGraph Benchmark achieves a micro F1 score of 0.82 on relation extraction.

03

The dataset and model facilitate research in medical NLP and multimodal learning.

Abstract

Extracting structured clinical information from free-text radiology reports can enable the use of radiology report information for a variety of critical healthcare applications. In our work, we present RadGraph, a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema we designed to structure radiology reports. We release a development dataset, which contains board-certified radiologist annotations for 500 radiology reports from the MIMIC-CXR dataset (14,579 entities and 10,889 relations), and a test dataset, which contains two independent sets of board-certified radiologist annotations for 100 radiology reports split equally across the MIMIC-CXR and CheXpert datasets. Using these datasets, we train and test a deep learning model, RadGraph Benchmark, that achieves a micro F1 of 0.82 and 0.73 on relation extraction on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rajpurkarlab/cxr-report-metric
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques