From Competition to Collaboration: Making Toy Datasets on Kaggle Clinically Useful for Chest X-Ray Diagnosis Using Federated Learning
Pranav Kulkarni, Adway Kanhere, Paul H. Yi, Vishwa S. Parekh

TL;DR
This study demonstrates how federated learning can combine separate Kaggle chest X-ray datasets to create a single, clinically useful model capable of diagnosing multiple diseases simultaneously, bridging the gap between data science and clinical application.
Contribution
It introduces a federated learning approach to integrate separate datasets for multi-disease diagnosis, enhancing clinical utility of toy datasets.
Findings
Global FL model performs comparably to separate models on AUROC metrics.
Federated learning enables multi-disease diagnosis from separate datasets.
Results support FL as a method to make toy datasets clinically relevant.
Abstract
Chest X-ray (CXR) datasets hosted on Kaggle, though useful from a data science competition standpoint, have limited utility in clinical use because of their narrow focus on diagnosing one specific disease. In real-world clinical use, multiple diseases need to be considered since they can co-exist in the same patient. In this work, we demonstrate how federated learning (FL) can be used to make these toy CXR datasets from Kaggle clinically useful. Specifically, we train a single FL classification model (`global`) using two separate CXR datasets -- one annotated for presence of pneumonia and the other for presence of pneumothorax (two common and life-threatening conditions) -- capable of diagnosing both. We compare the performance of the global FL model with models trained separately on both datasets (`baseline`) for two different model architectures. On a standard, naive 3-layer CNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Radiomics and Machine Learning in Medical Imaging · Lung Cancer Diagnosis and Treatment
