Generative Adversarial Networks for Synthetic Data Generation: A Comparative Study
Claire Little, Mark Elliot, Richard Allmendinger, Sahel Shariati, Samani

TL;DR
This paper compares various GAN-based methods for generating synthetic census microdata against traditional techniques, evaluating data utility and privacy risks to assess their effectiveness and safety.
Contribution
It provides a comprehensive comparison of GANs and traditional methods for synthetic tabular data generation, focusing on utility and privacy metrics.
Findings
GANs produce high-utility synthetic census data
GANs exhibit different privacy risk profiles compared to traditional methods
The study highlights strengths and limitations of GANs for data synthesis
Abstract
Generative Adversarial Networks (GANs) are gaining increasing attention as a means for synthesising data. So far much of this work has been applied to use cases outside of the data confidentiality domain with a common application being the production of artificial images. Here we consider the potential application of GANs for the purpose of generating synthetic census microdata. We employ a battery of utility metrics and a disclosure risk metric (the Targeted Correct Attribution Probability) to compare the data produced by tabular GANs with those produced using orthodox data synthesis methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)
