Enabling Synthetic Data adoption in regulated domains
Giorgio Visani, Giacomo Graffi, Mattia Alfero, Enrico Bagli, Davide, Capuzzo, Federico Chesani

TL;DR
This paper introduces DAISYnt, a comprehensive evaluation suite for synthetic data quality and privacy, facilitating its adoption in regulated sectors by systematically addressing evaluation challenges.
Contribution
It systematically catalogs synthetic data traits and develops DAISYnt, a standardized methodology for assessing synthetic data quality and privacy in regulated domains.
Findings
DAISYnt provides a reliable framework for synthetic data evaluation.
The best generative model was identified using DAISYnt on real-world credit data.
DAISYnt enables auditing and fine-tuning of synthetic data generators.
Abstract
The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms, bringing forward new challenges. In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for. Specific approaches to address the privacy issue have been developed, as Privacy Enhancing Technologies. However, they frequently cause loss of information, putting forward a crucial trade-off among data quality and privacy. A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties. Both Academia and Industry realized the importance of evaluating synthetic data quality: without all-round reliable metrics, the innovative data generation task has no proper objective function to maximize. Despite that, the topic remains under-explored. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · 3D Modeling in Geospatial Applications · Big Data Technologies and Applications
