Differentially Private Synthetic Data Generation for Relational Databases
Kaveh Alimohammadi, Hao Wang, Ojas Gulati, Akash Srivastava, Navid, Azizan

TL;DR
This paper presents a novel differentially private algorithm for generating synthetic relational databases that maintains data utility, preserves relationships, and scales efficiently without flattening the data.
Contribution
It introduces the first algorithm capable of generating DP synthetic relational data by iteratively refining inter-table relationships, avoiding data flattening, and providing utility guarantees.
Findings
Effective preservation of data fidelity demonstrated on real datasets
Algorithm scales well to high-dimensional, multi-table data
Provides both DP and theoretical utility guarantees
Abstract
Existing differentially private (DP) synthetic data generation mechanisms typically assume a single-source table. In practice, data is often distributed across multiple tables with relationships across tables. In this paper, we introduce the first-of-its-kind algorithm that can be combined with any existing DP mechanisms to generate synthetic relational databases. Our algorithm iteratively refines the relationship between individual synthetic tables to minimize their approximation errors in terms of low-order marginal distributions while maintaining referential integrity. This algorithm eliminates the need to flatten a relational database into a master table (saving space), operates efficiently (saving time), and scales effectively to high-dimensional data. We provide both DP and theoretical utility guarantees for our algorithm. Through numerical experiments on real-world datasets, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Access Control and Trust
