Flow Matching for Tabular Data Synthesis
Bahrul Ilmi Nasution, Floor Eijkelboom, Mark Elliot, Richard Allmendinger, Christian A. Naesseth

TL;DR
This paper explores flow matching methods for generating synthetic tabular data, demonstrating their superior performance and efficiency over diffusion models, with implications for privacy and data utility.
Contribution
It introduces and empirically evaluates flow matching techniques for tabular data synthesis, highlighting their advantages over existing diffusion-based methods.
Findings
Flow matching, especially TabbyFlow, outperforms diffusion baselines.
Flow matching achieves high-quality data synthesis with fewer than 100 steps.
Choice of probability path affects data utility and privacy risk.
Abstract
Synthetic data generation is an important tool for privacy-preserving data sharing. Although diffusion models have set recent benchmarks, flow matching (FM) offers a promising alternative. This paper presents different ways to implement FM for tabular data synthesis. We provide a comprehensive empirical study that compares flow matching (FM and variational FM) with a state-of-the-art diffusion method (TabDDPM and TabSyn) in tabular data synthesis. We evaluate both the standard Optimal Transport (OT) and the Variance Preserving (VP) probability paths, and also compare deterministic and stochastic samplers -- something possible when learning to generate using \textit{variational} FM -- characterising the empirical relationship between data utility and privacy risk. Our key findings reveal that FM, particularly TabbyFlow, outperforms diffusion baselines. Flow matching methods also achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Data Quality and Management
