ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models
Wei Pang, Masoumeh Shafieinejad, Lucy Liu, Stephanie Hazlewood, Xi He

TL;DR
ClavaDDPM is a new diffusion-based model that effectively synthesizes complex multi-relational tabular data by capturing long-range dependencies and relationships across multiple tables, outperforming existing methods.
Contribution
It introduces a clustering-guided diffusion model that models inter-table relationships and long-range dependencies efficiently in multi-relational data synthesis.
Findings
Outperforms existing methods on multi-table datasets
Effectively captures long-range dependencies across tables
Maintains competitive utility metrics for single-table data
Abstract
Recent research in tabular data synthesis has focused on single tables, whereas real-world applications often involve complex data with tens or hundreds of interconnected tables. Previous approaches to synthesizing multi-relational (multi-table) data fall short in two key aspects: scalability for larger datasets and capturing long-range dependencies, such as correlations between attributes spread across different tables. Inspired by the success of diffusion models in tabular data modeling, we introduce (ClavaDDPM). This novel approach leverages clustering labels as intermediaries to model relationships between tables, specifically focusing on foreign key constraints. ClavaDDPM leverages the robust generation capabilities of diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms
MethodsDiffusion
