Generating Synthetic Relational Tabular Data via Structural Causal Models
Frederik Hoppe, Astrid Franz, Lars Kleinemeier, Udo G\"obel

TL;DR
This paper introduces a novel framework for generating realistic synthetic relational tabular data with complex inter-table dependencies using structural causal models, addressing a gap in current data generation methods.
Contribution
It extends SCM-based data generation to relational formats, enabling realistic synthesis of interconnected tables with causal relationships.
Findings
Framework successfully generates relational datasets with complex dependencies.
Synthetic data mimics real-world relational structures.
Experiments validate the realism and utility of the generated data.
Abstract
Synthetic tabular data generation has received increasing attention in recent years, particularly with the emergence of foundation models for tabular data. The breakthrough success of TabPFN (Hollmann et al.,2025), which leverages vast quantities of synthetic tabular datasets derived from structural causal models (SCMs), demonstrates the critical role synthetic data plays in developing powerful tabular foundation models. However, most real-world tabular data exists in relational formats spanning multiple interconnected tables - a structure not adequately addressed by current generation methods. In this work, we extend the SCM-based approach by developing a novel framework that generates realistic synthetic relational tabular data including causal relationships across tables. Our experiments confirm that this framework is able to construct relational datasets with complex inter-table…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
