Copula Flows for Synthetic Data Generation
Sanket Kamthe, Samuel Assefa, Marc Deisenroth

TL;DR
This paper introduces a novel method for synthetic data generation using copula-based normalizing flows, offering an interpretable probabilistic alternative to GANs, especially effective with mixed data types.
Contribution
It proposes a copula flow model that separately estimates univariate marginals and copula density using normalizing flows, improving interpretability and handling mixed data types.
Findings
Achieves high-fidelity synthetic data generation.
Performs well in density estimation benchmarks.
Outperforms GAN-based methods on mixed data types.
Abstract
The ability to generate high-fidelity synthetic data is crucial when available (real) data is limited or where privacy and data protection standards allow only for limited use of the given data, e.g., in medical and financial data-sets. Current state-of-the-art methods for synthetic data generation are based on generative models, such as Generative Adversarial Networks (GANs). Even though GANs have achieved remarkable results in synthetic data generation, they are often challenging to interpret.Furthermore, GAN-based methods can suffer when used with mixed real and categorical variables.Moreover, loss function (discriminator loss) design itself is problem specific, i.e., the generative model may not be useful for tasks it was not explicitly trained for. In this paper, we propose to use a probabilistic model as a synthetic data generator. Learning the probabilistic model for the data is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Time Series Analysis and Forecasting · Data Analysis with R
