A density ratio framework for evaluating the utility of synthetic data
Thom Benjamin Volker, Peter-Paul de Wolf, Erik-Jan van Kesteren

TL;DR
This paper introduces a density ratio estimation framework to evaluate the quality of synthetic data, providing more accurate and interpretable utility measures to improve data synthesis and downstream analysis.
Contribution
It proposes a novel density ratio-based method for assessing synthetic data utility, enhancing accuracy and interpretability over existing measures.
Findings
Density ratio estimation improves global utility assessment.
The method requires minimal manual tuning.
Application demonstrates improved downstream analysis.
Abstract
Synthetic data generation is a promising technique to facilitate the use of sensitive data while mitigating the risk of privacy breaches. However, for synthetic data to be useful in downstream analysis tasks, it needs to be of sufficient quality. Various methods have been proposed to measure the utility of synthetic data, but their results are often incomplete or even misleading. In this paper, we propose using density ratio estimation to improve quality evaluation for synthetic data, and thereby the quality of synthesized datasets. We show how this framework relates to and builds on existing measures, yielding global and local utility measures that are informative and easy to interpret. We develop an estimator which requires little to no manual tuning due to automatic selection of a nonparametric density ratio model. Through simulations, we find that density ratio estimation yields…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsdemographic modeling and climate adaptation · Statistical Methods and Inference
