Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark Framework
Ruibo Tu, Zineb Senane, Lele Cao, Cheng Zhang, Hedvig Kjellstr\"om,, Gustav Eje Henter

TL;DR
This paper introduces a benchmark framework for evaluating tabular data synthesis models based on high-order structural causal information, addressing the challenge of capturing complex dependencies in synthetic data.
Contribution
It proposes a systematic evaluation framework using high-order causal information as prior knowledge, including datasets, tasks, and metrics for assessing synthesis models.
Findings
Benchmark reveals significant gaps between ideal and actual performance.
Evaluation shows how baseline methods differ in capturing high-order structures.
Framework enables assessment of model capability in representing complex causal relationships.
Abstract
Tabular synthesis models remain ineffective at capturing complex dependencies, and the quality of synthetic data is still insufficient for comprehensive downstream tasks, such as prediction under distribution shifts, automated decision-making, and cross-table understanding. A major challenge is the lack of prior knowledge about underlying structures and high-order relationships in tabular data. We argue that a systematic evaluation on high-order structural information for tabular data synthesis is the first step towards solving the problem. In this paper, we introduce high-order structural causal information as natural prior knowledge and provide a benchmark framework for the evaluation of tabular synthesis models. The framework allows us to generate benchmark datasets with a flexible range of data generation processes and to train tabular synthesis models using these datasets for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
MethodsCausal inference
