TL;DR
This paper introduces a divergence estimation method for validating synthetic tabular data by capturing joint distribution discrepancies, using probabilistic classifiers to improve accuracy over traditional marginal approaches.
Contribution
It proposes a novel divergence-based validation metric employing a probabilistic classifier to estimate joint distribution differences between real and synthetic data.
Findings
Accurately estimates divergences for simple distributions.
Effectively validates synthetic data on real-world datasets.
Outperforms traditional marginal comparison methods.
Abstract
The ever-increasing use of generative models in various fields where tabular data is used highlights the need for robust and standardized validation metrics to assess the similarity between real and synthetic data. Current methods lack a unified framework and rely on diverse and often inconclusive statistical measures. Divergences, which quantify discrepancies between data distributions, offer a promising avenue for validation. However, traditional approaches calculate divergences independently for each feature due to the complexity of joint distribution modeling. This paper addresses this challenge by proposing a novel approach that uses divergence estimation to overcome the limitations of marginal comparisons. Our core contribution lies in applying a divergence estimator to build a validation metric considering the joint distribution of real and synthetic data. We leverage a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
