Preserving logical and functional dependencies in synthetic tabular data
Chaithra Umesh, Kristian Schultz, Manjunath Mahendra, Saparshi Bej,, Olaf Wolkenhauer

TL;DR
This paper investigates whether current synthetic tabular data generation methods can preserve logical and functional dependencies among attributes, introduces a measure for logical dependencies, and compares algorithms' effectiveness in maintaining these dependencies.
Contribution
It introduces the concept of logical dependencies, proposes a measure for them, and evaluates how well existing algorithms preserve these and functional dependencies in synthetic data.
Findings
Current algorithms often fail to preserve functional dependencies.
Some models can preserve inter-attribute logical dependencies.
Research gaps identified for developing task-specific synthetic data models.
Abstract
Dependencies among attributes are a common aspect of tabular data. However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored. In addition to the existing notion of functional dependencies, we introduce the notion of logical dependencies among the attributes in this article. Moreover, we provide a measure to quantify logical dependencies among attributes in tabular data. Utilizing this measure, we compare several state-of-the-art synthetic data generation algorithms and test their capability to preserve logical and functional dependencies on several publicly available datasets. We demonstrate that currently available synthetic tabular data generation algorithms do not fully preserve functional dependencies when they generate synthetic datasets. In addition, we also showed that some tabular synthetic data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Semantic Web and Ontologies
