A Framework for Auditable Synthetic Data Generation
Florimond Houssiau, Samuel N. Cohen, Lukasz Szpruch, Owen Daniel,, Michaela G. Lawrence, Robin Mitra, Henry Wilde, Callum Mole

TL;DR
This paper introduces a flexible framework for generating synthetic data that allows data controllers to specify and verify which statistical properties are preserved, balancing data utility and privacy in high-dimensional datasets.
Contribution
The authors propose a novel framework enabling explicit control and empirical validation of statistical properties in synthetic data generation.
Findings
Framework allows control over statistical properties
Synthetic data maintains high utility for specific tasks
Empirical validation ensures privacy constraints are met
Abstract
Synthetic data has gained significant momentum thanks to sophisticated machine learning tools that enable the synthesis of high-dimensional datasets. However, many generation techniques do not give the data controller control over what statistical patterns are captured, leading to concerns over privacy protection. While synthetic records are not linked to a particular real-world individual, they can reveal information about users indirectly which may be unacceptable for data owners. There is thus a need to empirically verify the privacy of synthetic data -- a particularly challenging task in high-dimensional data. In this paper we present a general framework for synthetic data generation that gives data controllers full control over which statistical properties the synthetic data ought to preserve, what exact information loss is acceptable, and how to quantify it. The benefits of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Traffic Prediction and Management Techniques · Data Management and Algorithms
