Synthesizing Linked Data Under Cardinality and Integrity Constraints

Amir Gilad; Shweta Patwa; Ashwin Machanavajjhala

arXiv:2103.14435·cs.DB·March 29, 2021

Synthesizing Linked Data Under Cardinality and Integrity Constraints

Amir Gilad, Shweta Patwa, Ashwin Machanavajjhala

PDF

TL;DR

This paper presents a novel framework for generating synthetic linked data that satisfies complex cardinality and integrity constraints, addressing a key challenge in data synthesis and privacy preservation.

Contribution

It introduces a new declarative approach to impute missing foreign key values under constraints, proving NP-hardness and proposing an efficient two-phase solution with theoretical guarantees.

Findings

01

Solution scales well with data size and constraints

02

Maintains low error rates for cardinality constraints

03

Guarantees satisfaction of integrity constraints

Abstract

The generation of synthetic data is useful in multiple aspects, from testing applications to benchmarking to privacy preservation. Generating the links between relations, subject to cardinality constraints (CCs) and integrity constraints (ICs) is an important aspect of this problem. Given instances of two relations, where one has a foreign key dependence on the other and is missing its foreign key ( $F K$ ) values, and two types of constraints: (1) CCs that apply to the join view and (2) ICs that apply to the table with missing $F K$ values, our goal is to impute the missing $F K$ values such that the constraints are satisfied. We provide a novel framework for the problem based on declarative CCs and ICs. We further show that the problem is NP-hard and propose a novel two-phase solution that guarantees the satisfaction of the ICs. Phase I yields an intermediate solution accounting for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.