Generative Adversarial Networks for Synthetic Data Generation: A   Comparative Study

Claire Little; Mark Elliot; Richard Allmendinger; Sahel Shariati; Samani

arXiv:2112.01925·cs.LG·December 6, 2021·22 cites

Generative Adversarial Networks for Synthetic Data Generation: A Comparative Study

Claire Little, Mark Elliot, Richard Allmendinger, Sahel Shariati, Samani

PDF

Open Access

TL;DR

This paper compares various GAN-based methods for generating synthetic census microdata against traditional techniques, evaluating data utility and privacy risks to assess their effectiveness and safety.

Contribution

It provides a comprehensive comparison of GANs and traditional methods for synthetic tabular data generation, focusing on utility and privacy metrics.

Findings

01

GANs produce high-utility synthetic census data

02

GANs exhibit different privacy risk profiles compared to traditional methods

03

The study highlights strengths and limitations of GANs for data synthesis

Abstract

Generative Adversarial Networks (GANs) are gaining increasing attention as a means for synthesising data. So far much of this work has been applied to use cases outside of the data confidentiality domain with a common application being the production of artificial images. Here we consider the potential application of GANs for the purpose of generating synthetic census microdata. We employ a battery of utility metrics and a disclosure risk metric (the Targeted Correct Attribution Probability) to compare the data produced by tabular GANs with those produced using orthodox data synthesis methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)