CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources
Sikha Pentyala, Mayana Pereira, Martine De Cock

TL;DR
CaPS introduces a privacy-preserving framework for collaborative synthetic data generation from distributed sources, combining secure multi-party computation and differential privacy to enable data sharing without trusting a central aggregator.
Contribution
The paper presents a novel framework that allows multiple data holders to generate synthetic data collaboratively while ensuring privacy, applicable to any marginal-based SDG method.
Findings
Framework is applicable to state-of-the-art SDG algorithms.
Demonstrates scalability and effectiveness of privacy-preserving synthetic data generation.
Enables data sharing without trusting a central entity.
Abstract
Data is the lifeblood of the modern world, forming a fundamental part of AI, decision-making, and research advances. With increase in interest in data, governments have taken important steps towards a regulated data world, drastically impacting data sharing and data usability and resulting in massive amounts of data confined within the walls of organizations. While synthetic data generation (SDG) is an appealing solution to break down these walls and enable data sharing, the main drawback of existing solutions is the assumption of a trusted aggregator for generative model training. Given that many data holders may not want to, or be legally allowed to, entrust a central entity with their raw data, we propose a framework for the collaborative and private generation of synthetic tabular data from distributed data holders. Our solution is general, applicable to any marginal-based SDG, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Semantic Web and Ontologies
