Privacy of synthetic data: a statistical framework

March Boedihardjo; Thomas Strohmer; Roman Vershynin

arXiv:2109.01748·cs.CR·September 7, 2021·1 cites

Privacy of synthetic data: a statistical framework

March Boedihardjo, Thomas Strohmer, Roman Vershynin

PDF

Open Access

TL;DR

This paper introduces a statistical framework for creating differentially private synthetic data by sampling from a reduced space and fitting linear statistics, effectively balancing privacy and data utility.

Contribution

It proposes a novel approach that circumvents NP-hardness in synthetic data generation using a sampling-based statistical framework with privacy guarantees.

Findings

01

Provides explicit bounds on privacy and accuracy using Renyi condition number.

02

Demonstrates the effectiveness of the method in preserving privacy while maintaining data utility.

03

Offers a practical solution to the computational hardness of private synthetic data generation.

Abstract

Privacy-preserving data analysis is emerging as a challenging problem with far-reaching impact. In particular, synthetic data are a promising concept toward solving the aporetic conflict between data privacy and data sharing. Yet, it is known that accurately generating private, synthetic data of certain kinds is NP-hard. We develop a statistical framework for differentially private synthetic data, which enables us to circumvent the computational hardness of the problem. We consider the true data as a random sample drawn from a population Omega according to some unknown density. We then replace Omega by a much smaller random subset Omega^*, which we sample according to some known density. We generate synthetic data on the reduced space Omega^* by fitting the specified linear statistics obtained from the true data. To ensure privacy we use the common Laplacian mechanism. Employing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Random Matrices and Applications · Probability and Risk Models