Privacy-Preserving Synthetic Datasets Over Weakly Constrained Domains

Luke Rodriguez; Bill Howe

arXiv:1808.07603·cs.DB·August 24, 2018·1 cites

Privacy-Preserving Synthetic Datasets Over Weakly Constrained Domains

Luke Rodriguez, Bill Howe

PDF

Open Access

TL;DR

This paper introduces a new algorithm for generating differentially private synthetic datasets over large, weakly constrained domains, improving data sharing while maintaining privacy without requiring domain-specific data inspection.

Contribution

The paper presents an algorithm that models unrepresented domains analytically, enabling privacy-preserving synthetic data generation in realistic open data scenarios.

Findings

01

Produces sensible results on real datasets

02

Models unrepresented domains analytically

03

Balances privacy and utility effectively

Abstract

Techniques to deliver privacy-preserving synthetic datasets take a sensitive dataset as input and produce a similar dataset as output while maintaining differential privacy. These approaches have the potential to improve data sharing and reuse, but they must be accessible to non-experts and tolerant of realistic data. Existing approaches make an implicit assumption that the active domain of the dataset is similar to the global domain, potentially violating differential privacy. In this paper, we present an algorithm for generating differentially private synthetic data over the large, weakly constrained domains we find in realistic open data situations. Our algorithm models the unrepresented domain analytically as a probability distribution to adjust the output and compute noise, avoiding the need to compute the full domain explicitly. We formulate the tradeoff between privacy and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Stochastic Gradient Optimization Techniques