SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data
Anton Danholt Lautrup, Tobias Hyrup, Arthur Zimek, Peter, Schneider-Kamp

TL;DR
SynthEval is an open-source framework that provides comprehensive, customizable evaluation of tabular synthetic data's utility and privacy, handling categorical and numerical data equally without special preprocessing.
Contribution
It introduces a versatile, extensible evaluation framework that improves benchmarking and comparison of synthetic tabular data in terms of utility and privacy.
Findings
Effective in evaluating data fidelity and privacy risks
Handles categorical and numerical data equally
Supports customizable metrics and benchmarking
Abstract
With the growing demand for synthetic data to address contemporary issues in machine learning, such as data scarcity, data fairness, and data privacy, having robust tools for assessing the utility and potential privacy risks of such data becomes crucial. SynthEval, a novel open-source evaluation framework distinguishes itself from existing tools by treating categorical and numerical attributes with equal care, without assuming any special kind of preprocessing steps. This~makes it applicable to virtually any synthetic dataset of tabular records. Our tool leverages statistical and machine learning techniques to comprehensively evaluate synthetic data fidelity and privacy-preserving integrity. SynthEval integrates a wide selection of metrics that can be used independently or in highly customisable benchmark configurations, and can easily be extended with additional metrics. In this paper,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Digital and Cyber Forensics
