Improving TabPFN's Synthetic Data Generation by Integrating Causal Structure
Davide Tugnoli, Andrea De Lorenzo, Marco Virgolin, Giovanni Cin\`a

TL;DR
This paper enhances TabPFN's synthetic tabular data generation by integrating causal structure, addressing limitations caused by feature order dependencies, and improving data quality and causal effect preservation.
Contribution
It introduces DAG-aware conditioning and CPDAG-based strategies to incorporate causal structure into TabPFN's autoregressive generation process.
Findings
DAG-aware conditioning improves data quality and stability.
Incorporating causal structure enhances causal effect preservation.
Moderate improvements with CPDAG-based strategy depending on causal knowledge.
Abstract
Synthetic tabular data generation addresses data scarcity and privacy constraints in a variety of domains. Tabular Prior-Data Fitted Network (TabPFN), a recent foundation model for tabular data, has been shown capable of generating high-quality synthetic tabular data. However, TabPFN is autoregressive: features are generated sequentially by conditioning on the previous ones, depending on the order in which they appear in the input data. We demonstrate that when the feature order conflicts with causal structure, the model produces spurious correlations that impair its ability to generate synthetic data and preserve causal effects. We address this limitation by integrating causal structure into TabPFN's generation process through two complementary approaches: Directed Acyclic Graph (DAG)-aware conditioning, which samples each variable given its causal parents, and a Completed Partially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Advanced Graph Neural Networks · Data Quality and Management
