MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs
Alistair E. W. Johnson, Tom J. Pollard, Nathaniel R. Greenbaum,, Matthew P. Lungren, Chih-ying Deng, Yifan Peng, Zhiyong Lu, Roger G. Mark,, Seth J. Berkowitz, Steven Horng

TL;DR
MIMIC-CXR-JPG v2.0.0 is a large, publicly available dataset of over 377,000 labeled chest X-ray images with associated reports, designed to support research in automated medical image analysis.
Contribution
This paper introduces a large, de-identified chest X-ray dataset with standardized labels and data splits to advance medical computer vision research.
Findings
Provides a comprehensive dataset for training deep learning models.
Includes standardized labels derived from radiology reports.
Facilitates reproducibility and benchmarking in medical image analysis.
Abstract
Chest radiography is an extremely powerful imaging modality, allowing for a detailed inspection of a patient's thorax, but requiring specialized training for proper interpretation. With the advent of high performance general purpose computer vision algorithms, the accurate automated analysis of chest radiographs is becoming increasingly of interest to researchers. However, a key challenge in the development of these techniques is the lack of sufficient data. Here we describe MIMIC-CXR-JPG v2.0.0, a large dataset of 377,110 chest x-rays associated with 227,827 imaging studies sourced from the Beth Israel Deaconess Medical Center between 2011 - 2016. Images are provided with 14 labels derived from two natural language processing tools applied to the corresponding free-text radiology reports. MIMIC-CXR-JPG is derived entirely from the MIMIC-CXR database, and aims to provide a convenient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Radiomics and Machine Learning in Medical Imaging · Lung Cancer Diagnosis and Treatment
