Synthesizing Realistic Test Data without Breaking Privacy
Laura Plein, Alexi Turcotte, Arina Hallemans, Andreas Zeller

TL;DR
This paper introduces a novel method inspired by GANs to generate synthetic test data that maintains the statistical properties of original datasets while enhancing privacy and reducing vulnerability to attacks.
Contribution
The authors propose a privacy-preserving data generation approach using a test generator and discriminator, which indirectly leverages original data to produce high-utility synthetic datasets.
Findings
High utility of synthetic data demonstrated on four datasets
Reduced vulnerability to membership inference and reconstruction attacks
Comparable statistical properties to original datasets
Abstract
There is a need for synthetic training and test datasets that replicate statistical distributions of original datasets without compromising their confidentiality. A lot of research has been done in leveraging Generative Adversarial Networks (GANs) for synthetic data generation. However, the resulting models are either not accurate enough or are still vulnerable to membership inference attacks (MIA) or dataset reconstruction attacks since the original data has been leveraged in the training process. In this paper, we explore the feasibility of producing a synthetic test dataset with the same statistical properties as the original one, with only indirectly leveraging the original data in the generation process. The approach is inspired by GANs, with a generation step and a discrimination step. However, in our approach, we use a test generator (a fuzzer) to produce test data from an input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
