A supervised generative optimization approach for tabular data
Shinpei Nakamura-Sakai, Fadi Hamad, Saheed Obitayo, Vamsi K. Potluru

TL;DR
This paper introduces a supervised generative optimization framework for creating synthetic tabular data, incorporating downstream task information and meta-learning to improve data utility and relevance.
Contribution
It proposes a novel framework that combines supervised learning and meta-learning to optimize synthetic data generation for specific downstream tasks.
Findings
Enhanced synthetic data relevance for downstream tasks
Meta-learning improves distribution mixture selection
Framework outperforms unsupervised methods in utility
Abstract
Synthetic data generation has emerged as a crucial topic for financial institutions, driven by multiple factors, such as privacy protection and data augmentation. Many algorithms have been proposed for synthetic data generation but reaching the consensus on which method we should use for the specific data sets and use cases remains challenging. Moreover, the majority of existing approaches are ``unsupervised'' in the sense that they do not take into account the downstream task. To address these issues, this work presents a novel synthetic data generation framework. The framework integrates a supervised component tailored to the specific downstream task and employs a meta-learning approach to learn the optimal mixture distribution of existing synthetic distributions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Data Management and Algorithms
