Tabular Foundation Model for Generative Modelling
Xiangjian Jiang,Mingxuan Liu,Nikola Simidjievski,Tassilo Klein,Mateja Jamnik

TL;DR
This paper introduces TabFORGE, a novel generative model for tabular data that leverages causal structure and a two-stage training process to produce high-quality synthetic datasets.
Contribution
The paper presents TabFORGE, a causality-aware, two-stage pretrained generative model for tabular data, addressing limitations of existing generators in capturing structural fidelity.
Findings
TabFORGE outperforms 22 benchmarks on 45 datasets in synthetic data quality.
It effectively captures the causal structure of heterogeneous tabular data.
The model demonstrates strong structural fidelity in generated datasets.
Abstract
Generative modelling is a demanding test of foundation models, because it requires robust, holistic representation learning for a given data modality, rather than optimisation for a supervised prediction target alone. While recent work on tabular foundation models has achieved remarkable progress in predictive modelling, generative tabular foundation models remain underexplored. Existing tabular foundation generators, in particular, have not yet consistently matched strong dataset-specific generators in synthetic data quality. A key reason is their misalignment with the distinctive causal structural prior of heterogeneous tabular data. In this paper, we address this gap by introducing a novel tabular foundation model, \textbf{TabFORGE}, built on pretrained \textbf{Tab}ular \textbf{FO}undational \textbf{R}epresentations for \textbf{GE}neration. TabFORGE is designed to utilise the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
