COVIDx CXR-4: An Expanded Multi-Institutional Open-Source Benchmark Dataset for Chest X-ray Image-Based Computer-Aided COVID-19 Diagnostics
Yifan Wu, Hayden Gunraj, Chi-en Amy Tai, Alexander Wong

TL;DR
COVIDx CXR-4 is the largest and most diverse open-source chest X-ray dataset for COVID-19 diagnostics, significantly expanding previous datasets to support improved deep learning models.
Contribution
This paper introduces COVIDx CXR-4, a large, multi-institutional dataset that enhances diversity and size for COVID-19 chest X-ray analysis, addressing previous dataset limitations.
Findings
Expanded dataset size by over 2.66 times
Includes 84,818 images from 45,342 patients
Highlights potential dataset biases
Abstract
The global ramifications of the COVID-19 pandemic remain significant, exerting persistent pressure on nations even three years after its initial outbreak. Deep learning models have shown promise in improving COVID-19 diagnostics but require diverse and larger-scale datasets to improve performance. In this paper, we introduce COVIDx CXR-4, an expanded multi-institutional open-source benchmark dataset for chest X-ray image-based computer-aided COVID-19 diagnostics. COVIDx CXR-4 expands significantly on the previous COVIDx CXR-3 dataset by increasing the total patient cohort size by greater than 2.66 times, resulting in 84,818 images from 45,342 patients across multiple institutions. We provide extensive analysis on the diversity of the patient demographic, imaging metadata, and disease distributions to highlight potential dataset biases. To the best of the authors' knowledge, COVIDx CXR-4…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Radiomics and Machine Learning in Medical Imaging · AI in cancer detection
