Boosting Synthetic Data Generation with Effective Nonlinear Causal   Discovery

Martina Cinquini; Fosca Giannotti; Riccardo Guidotti

arXiv:2301.07427·cs.AI·October 16, 2024

Boosting Synthetic Data Generation with Effective Nonlinear Causal Discovery

Martina Cinquini, Fosca Giannotti, Riccardo Guidotti

PDF

1 Repo

TL;DR

This paper introduces a novel method for generating synthetic tabular data that incorporates nonlinear causal relationships among features, improving plausibility and utility in various AI applications.

Contribution

It presents an efficient framework that discovers nonlinear causalities among features using pattern mining, enhancing synthetic data generation accuracy.

Findings

01

Effective discovery of nonlinear causalities in synthetic data

02

Improved plausibility of generated datasets

03

Validated on synthetic and real datasets with known causalities

Abstract

Synthetic data generation has been widely adopted in software testing, data privacy, imbalanced learning, and artificial intelligence explanation. In all such contexts, it is crucial to generate plausible data samples. A common assumption of approaches widely used for data generation is the independence of the features. However, typically, the variables of a dataset depend on one another, and these dependencies are not considered in data generation leading to the creation of implausible records. The main problem is that dependencies among variables are typically unknown. In this paper, we design a synthetic dataset generator for tabular data that can discover nonlinear causalities among the variables and use them at generation time. State-of-the-art methods for nonlinear causal discovery are typically inefficient. We boost them by restricting the causal discovery among the features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marti5ini/gencda
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.