Conditional Synthetic Data Generation for Robust Machine Learning Applications with Limited Pandemic Data
Hari Prasanna Das, Ryan Tran, Japjot Singh, Xiangyu Yue, Geoff Tison,, Alberto Sangiovanni-Vincentelli, Costas J. Spanos

TL;DR
This paper introduces a hybrid conditional generative model to produce synthetic medical images, enhancing machine learning robustness during pandemics with limited and scarce labeled data, demonstrated on COVID-19 CT scans.
Contribution
It proposes a novel hybrid model combining a conditional generative flow and classifier, along with a semi-supervised approach for synthetic data generation under label scarcity.
Findings
Outperforms existing models in generating realistic conditional CT scan data.
Semi-supervised approach effectively synthesizes data with limited labels.
Improves COVID-19 detection accuracy using synthetic data augmentation.
Abstract
At the onset of a pandemic, such as COVID-19, data with proper labeling/attributes corresponding to the new disease might be unavailable or sparse. Machine Learning (ML) models trained with the available data, which is limited in quantity and poor in diversity, will often be biased and inaccurate. At the same time, ML algorithms designed to fight pandemics must have good performance and be developed in a time-sensitive manner. To tackle the challenges of limited data, and label scarcity in the available data, we propose generating conditional synthetic data, to be used alongside real data for developing robust ML models. We present a hybrid model consisting of a conditional generative flow and a classifier for conditional synthetic data generation. The classifier decouples the feature representation for the condition, which is fed to the flow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · COVID-19 diagnosis using AI · Machine Learning in Healthcare
