Generative Modeling of Complex Data
Luca Canale, Nicolas Grislain, Gr\'egoire Lothe, Johan Leduc

TL;DR
This paper introduces a flexible generative framework for complex, nested data structures using causal transformers, significantly improving synthetic data quality for intricate real-world datasets.
Contribution
It presents a novel generic framework and a practical transformer-based implementation capable of synthesizing complex hierarchical data structures.
Findings
Outperforms state-of-the-art models on benchmark datasets.
Achieves strong results on complex hierarchical datasets.
Demonstrates utility in machine learning and statistical similarity.
Abstract
In recent years, several models have improved the capacity to generate synthetic tabular datasets. However, such models focus on synthesizing simple columnar tables and are not useable on real-life data with complex structures. This paper puts forward a generic framework to synthesize more complex data structures with composite and nested types. It then proposes one practical implementation, built with causal transformers, for struct (mappings of types) and lists (repeated instances of a type). The results on standard benchmark datasets show that such implementation consistently outperforms current state-of-the-art models both in terms of machine learning utility and statistical similarity. Moreover, it shows very strong results on two complex hierarchical datasets with multiple nesting and sparse data, that were previously out of reach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Quality and Management
