TL;DR
This study evaluates the use of GAN-generated synthetic medical images as a privacy-preserving alternative to real data sharing, demonstrating comparable utility under certain conditions and providing practical guidelines for their use.
Contribution
It offers a comprehensive benchmark and analysis of GAN-generated medical images, highlighting their potential and limitations for research and data sharing.
Findings
Synthetic data benefits from fewer label combinations.
Label overfitting affects GAN training at low sample sizes.
Radiologists cannot reliably distinguish synthetic from real images at intermediate resolutions.
Abstract
Privacy concerns around sharing personally identifiable information are a major practical barrier to data sharing in medical research. However, in many cases, researchers have no interest in a particular individual's information but rather aim to derive insights at the level of cohorts. Here, we utilize Generative Adversarial Networks (GANs) to create derived medical imaging datasets consisting entirely of synthetic patient data. The synthetic images ideally have, in aggregate, similar statistical properties to those of a source dataset but do not contain sensitive personal information. We assess the quality of synthetic data generated by two GAN models for chest radiographs with 14 different radiology findings and brain computed tomography (CT) scans with six types of intracranial hemorrhages. We measure the synthetic image quality by the performance difference of predictive models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
