SynthEval: A Framework for Detailed Utility and Privacy Evaluation of   Tabular Synthetic Data

Anton Danholt Lautrup; Tobias Hyrup; Arthur Zimek; Peter; Schneider-Kamp

arXiv:2404.15821·cs.LG·December 5, 2024·2 cites

SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data

Anton Danholt Lautrup, Tobias Hyrup, Arthur Zimek, Peter, Schneider-Kamp

PDF

Open Access 1 Repo

TL;DR

SynthEval is an open-source framework that provides comprehensive, customizable evaluation of tabular synthetic data's utility and privacy, handling categorical and numerical data equally without special preprocessing.

Contribution

It introduces a versatile, extensible evaluation framework that improves benchmarking and comparison of synthetic tabular data in terms of utility and privacy.

Findings

01

Effective in evaluating data fidelity and privacy risks

02

Handles categorical and numerical data equally

03

Supports customizable metrics and benchmarking

Abstract

With the growing demand for synthetic data to address contemporary issues in machine learning, such as data scarcity, data fairness, and data privacy, having robust tools for assessing the utility and potential privacy risks of such data becomes crucial. SynthEval, a novel open-source evaluation framework distinguishes itself from existing tools by treating categorical and numerical attributes with equal care, without assuming any special kind of preprocessing steps. This~makes it applicable to virtually any synthetic dataset of tabular records. Our tool leverages statistical and machine learning techniques to comprehensively evaluate synthetic data fidelity and privacy-preserving integrity. SynthEval integrates a wide selection of metrics that can be used independently or in highly customisable benchmark configurations, and can easily be extended with additional metrics. In this paper,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

schneiderkamplab/syntheval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Digital and Cyber Forensics