TabTreeFormer: Tabular Data Generation Using Hybrid Tree-Transformer
Jiayu Li, Bingyin Zhao, Zilong Zhao, Uzair Javaid, Kevin Yee, Biplab Sikdar

TL;DR
TabTreeFormer is a hybrid tree-transformer model designed for tabular data generation, integrating domain-specific inductive biases and a novel tokenizer to improve efficiency, fidelity, and utility across diverse datasets.
Contribution
It introduces a hybrid architecture combining tree-based inductive biases with transformers and a new tokenizer for better tabular data modeling.
Findings
Outperforms baseline models in utility, fidelity, and privacy metrics.
Achieves up to 44% performance gain in utility-focused scenarios.
Demonstrates consistent improvements across nine datasets.
Abstract
Transformers have shown impressive results in tabular data generation. However, they lack domain-specific inductive biases which are critical for preserving the intrinsic characteristics of tabular data. They also suffer from poor scalability and efficiency due to quadratic computational complexity. In this paper, we propose TabTreeFormer, a hybrid transformer architecture that integrates inductive biases of tree-based models (i.e., non-smoothness and non-rotational invariance) to effectively handle the discrete and weakly correlated features in tabular datasets. To improve numerical fidelity and capture multimodal distributions, we introduce a novel tokenizer that learns token sequences based on the complexity of tabular values. This reduces vocabulary size and sequence length, yielding more compact and efficient representations without sacrificing performance. We evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Motion and Animation · Handwritten Text Recognition Techniques
MethodsSparse Evolutionary Training
