Synthesizing Linked Data Under Cardinality and Integrity Constraints
Amir Gilad, Shweta Patwa, Ashwin Machanavajjhala

TL;DR
This paper presents a novel framework for generating synthetic linked data that satisfies complex cardinality and integrity constraints, addressing a key challenge in data synthesis and privacy preservation.
Contribution
It introduces a new declarative approach to impute missing foreign key values under constraints, proving NP-hardness and proposing an efficient two-phase solution with theoretical guarantees.
Findings
Solution scales well with data size and constraints
Maintains low error rates for cardinality constraints
Guarantees satisfaction of integrity constraints
Abstract
The generation of synthetic data is useful in multiple aspects, from testing applications to benchmarking to privacy preservation. Generating the links between relations, subject to cardinality constraints (CCs) and integrity constraints (ICs) is an important aspect of this problem. Given instances of two relations, where one has a foreign key dependence on the other and is missing its foreign key () values, and two types of constraints: (1) CCs that apply to the join view and (2) ICs that apply to the table with missing values, our goal is to impute the missing values such that the constraints are satisfied. We provide a novel framework for the problem based on declarative CCs and ICs. We further show that the problem is NP-hard and propose a novel two-phase solution that guarantees the satisfaction of the ICs. Phase I yields an intermediate solution accounting for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
