Copula Flows for Synthetic Data Generation

Sanket Kamthe; Samuel Assefa; Marc Deisenroth

arXiv:2101.00598·stat.ML·January 5, 2021·38 cites

Copula Flows for Synthetic Data Generation

Sanket Kamthe, Samuel Assefa, Marc Deisenroth

PDF

Open Access

TL;DR

This paper introduces a novel method for synthetic data generation using copula-based normalizing flows, offering an interpretable probabilistic alternative to GANs, especially effective with mixed data types.

Contribution

It proposes a copula flow model that separately estimates univariate marginals and copula density using normalizing flows, improving interpretability and handling mixed data types.

Findings

01

Achieves high-fidelity synthetic data generation.

02

Performs well in density estimation benchmarks.

03

Outperforms GAN-based methods on mixed data types.

Abstract

The ability to generate high-fidelity synthetic data is crucial when available (real) data is limited or where privacy and data protection standards allow only for limited use of the given data, e.g., in medical and financial data-sets. Current state-of-the-art methods for synthetic data generation are based on generative models, such as Generative Adversarial Networks (GANs). Even though GANs have achieved remarkable results in synthetic data generation, they are often challenging to interpret.Furthermore, GAN-based methods can suffer when used with mixed real and categorical variables.Moreover, loss function (discriminator loss) design itself is problem specific, i.e., the generative model may not be useful for tasks it was not explicitly trained for. In this paper, we propose to use a probabilistic model as a synthetic data generator. Learning the probabilistic model for the data is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Time Series Analysis and Forecasting · Data Analysis with R