VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations
Ha Q. Nguyen, Khanh Lam, Linh T. Le, Hieu H. Pham, Dat Q. Tran, Dung, B. Nguyen, Dung D. Le, Chi M. Pham, Hang T. T. Tong, Diep H. Dinh, Cuong D., Do, Luu T. Doan, Cuong N. Nguyen, Binh T. Nguyen, Que V. Nguyen, Au D. Hoang,, Hien N. Phan, Anh T. Nguyen, Phuong H. Ho, Dat T. Ngo

TL;DR
This paper introduces VinDr-CXR, a large, publicly available dataset of over 100,000 chest X-rays with detailed radiologist annotations for abnormalities and diseases, aimed at advancing machine learning in medical imaging.
Contribution
The creation and release of a comprehensive, annotated chest X-ray dataset with location-specific labels and a dedicated labeling platform, supporting improved detection and localization algorithms.
Findings
Over 18,000 images manually annotated by radiologists
Dataset divided into training and test sets with multiple annotations per image
Public availability of images and labels to facilitate research
Abstract
Most of the existing chest X-ray datasets include labels from a list of findings without specifying their locations on the radiographs. This limits the development of machine learning algorithms for the detection and localization of chest abnormalities. In this work, we describe a dataset of more than 100,000 chest X-ray scans that were retrospectively collected from two major hospitals in Vietnam. Out of this raw data, we release 18,000 images that were manually annotated by a total of 17 experienced radiologists with 22 local labels of rectangles surrounding abnormalities and 6 global labels of suspected diseases. The released dataset is divided into a training set of 15,000 and a test set of 3,000. Each scan in the training set was independently labeled by 3 radiologists, while each scan in the test set was labeled by the consensus of 5 radiologists. We designed and built a labeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Radiomics and Machine Learning in Medical Imaging · Lung Cancer Diagnosis and Treatment
