Covariance's Loss is Privacy's Gain: Computationally Efficient, Private   and Accurate Synthetic Data

March Boedihardjo; Thomas Strohmer; Roman Vershynin

arXiv:2107.05824·cs.CR·August 11, 2022

Covariance's Loss is Privacy's Gain: Computationally Efficient, Private and Accurate Synthetic Data

March Boedihardjo, Thomas Strohmer, Roman Vershynin

PDF

Open Access

TL;DR

This paper introduces a novel approach to synthetic data generation that balances computational efficiency, privacy guarantees, and data utility by leveraging covariance loss analysis, offering a practical solution to a complex NP-hard problem.

Contribution

It presents a new method that uses covariance loss analysis to produce privacy-preserving synthetic data efficiently with provable guarantees and utility quantification.

Findings

01

Nearly optimal solutions for covariance loss in conditional expectation.

02

Constructive methods for microaggregation and synthetic data generation.

03

Theoretical insights enabling practical privacy-preserving data release.

Abstract

The protection of private information is of vital importance in data-driven research, business, and government. The conflict between privacy and utility has triggered intensive research in the computer science and statistics communities, who have developed a variety of methods for privacy-preserving data release. Among the main concepts that have emerged are anonymity and differential privacy. Today, another solution is gaining traction, synthetic data. However, the road to privacy is paved with NP-hard problems. In this paper we focus on the NP-hard challenge to develop a synthetic data generation method that is computationally efficient, comes with provable privacy guarantees, and rigorously quantifies data utility. We solve a relaxed version of this problem by studying a fundamental, but a first glance completely unrelated, problem in probability concerning the concept of covariance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Probability and Risk Models