Generating Tabular Data Using Heterogeneous Sequential Feature Forest Flow Matching
Ange-Cl\'ement Akazan, Alexia Jolicoeur-Martineau, Ioannis Mitliagkas

TL;DR
HS3F is a new method for generating synthetic tabular data that improves speed, quality, and robustness over previous flow-based models by sequentially generating features and using multinomial sampling for categorical variables.
Contribution
The paper introduces HS3F, a novel sequential feature generation approach that addresses speed and accuracy issues in existing flow-based tabular data generators.
Findings
HS3F produces higher quality synthetic data than FF.
HS3F is 21-27 times faster on datasets with many categorical variables.
HS3F shows increased robustness to flow initial condition variations.
Abstract
Privacy and regulatory constraints make data generation vital to advancing machine learning without relying on real-world datasets. A leading approach for tabular data generation is the Forest Flow (FF) method, which combines Flow Matching with XGBoost. Despite its good performance, FF is slow and makes errors when treating categorical variables as one-hot continuous features. It is also highly sensitive to small changes in the initial conditions of the ordinary differential equation (ODE). To overcome these limitations, we develop Heterogeneous Sequential Feature Forest Flow (HS3F). Our method generates data sequentially (feature-by-feature), reducing the dependency on noisy initial conditions through the additional information from previously generated features. Furthermore, it generates categorical variables using multinomial sampling (from an XGBoost classifier) instead of flow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Data Management and Algorithms · Video Analysis and Summarization
