REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers
Aivin V. Solatorio, Olivier Dupriez

TL;DR
REaLTabFormer is a novel transformer-based model that generates realistic relational and tabular data, effectively capturing structure and improving prediction tasks without fine-tuning.
Contribution
It introduces a new generative framework combining autoregressive and sequence-to-sequence transformers for relational data synthesis.
Findings
Outperforms baseline models in capturing relational structure
Achieves state-of-the-art results on prediction tasks
Effective in generating large non-relational datasets
Abstract
Tabular data is a common form of organizing data. Multiple models are available to generate synthetic tabular datasets where observations are independent, but few have the ability to produce relational datasets. Modeling relational data is challenging as it requires modeling both a "parent" table and its relationships across tables. We introduce REaLTabFormer (Realistic Relational and Tabular Transformer), a tabular and relational synthetic data generation model. It first creates a parent table using an autoregressive GPT-2 model, then generates the relational dataset conditioned on the parent table using a sequence-to-sequence (Seq2Seq) model. We implement target masking to prevent data copying and propose the statistic and statistical bootstrapping to detect overfitting. Experiments using real-world datasets show that REaLTabFormer captures the relational structure better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Data Quality and Management · Machine Learning in Healthcare
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · Residual Connection · Weight Decay · Discriminative Fine-Tuning · Dropout · Dense Connections · Attention Dropout
