The pitfalls of using open data to develop deep learning solutions for COVID-19 detection in chest X-rays
Rachael Harkness, Geoff Hall, Alejandro F Frangi, Nishant Ravikumar,, Kieran Zucker

TL;DR
This paper critically examines the limitations of using open-source COVID-19 chest X-ray datasets for deep learning, revealing that models trained on such data may not generalize well to real clinical settings and are prone to bias.
Contribution
The study highlights the pitfalls of relying on open-source datasets like COVIDx for COVID-19 detection, emphasizing the need for careful validation on diverse and representative data.
Findings
Models trained on COVIDx perform poorly on external and hospital datasets.
Open-source datasets may contain biases that inflate model performance.
Careful analysis is required to develop clinically useful AI tools.
Abstract
Since the emergence of COVID-19, deep learning models have been developed to identify COVID-19 from chest X-rays. With little to no direct access to hospital data, the AI community relies heavily on public data comprising numerous data sources. Model performance results have been exceptional when training and testing on open-source data, surpassing the reported capabilities of AI in pneumonia-detection prior to the COVID-19 outbreak. In this study impactful models are trained on a widely used open-source data and tested on an external test set and a hospital dataset, for the task of classifying chest X-rays into one of three classes: COVID-19, non-COVID pneumonia and no-pneumonia. Classification performance of the models investigated is evaluated through ROC curves, confusion matrices and standard classification metrics. Explainability modules are implemented to explore the image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Radiomics and Machine Learning in Medical Imaging · Machine Learning in Healthcare
