Covariance's Loss is Privacy's Gain: Computationally Efficient, Private and Accurate Synthetic Data
March Boedihardjo, Thomas Strohmer, Roman Vershynin

TL;DR
This paper introduces a novel approach to synthetic data generation that balances computational efficiency, privacy guarantees, and data utility by leveraging covariance loss analysis, offering a practical solution to a complex NP-hard problem.
Contribution
It presents a new method that uses covariance loss analysis to produce privacy-preserving synthetic data efficiently with provable guarantees and utility quantification.
Findings
Nearly optimal solutions for covariance loss in conditional expectation.
Constructive methods for microaggregation and synthetic data generation.
Theoretical insights enabling practical privacy-preserving data release.
Abstract
The protection of private information is of vital importance in data-driven research, business, and government. The conflict between privacy and utility has triggered intensive research in the computer science and statistics communities, who have developed a variety of methods for privacy-preserving data release. Among the main concepts that have emerged are anonymity and differential privacy. Today, another solution is gaining traction, synthetic data. However, the road to privacy is paved with NP-hard problems. In this paper we focus on the NP-hard challenge to develop a synthetic data generation method that is computationally efficient, comes with provable privacy guarantees, and rigorously quantifies data utility. We solve a relaxed version of this problem by studying a fundamental, but a first glance completely unrelated, problem in probability concerning the concept of covariance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Probability and Risk Models
