Quality Degradation Attack in Synthetic Data
Qinyi Liu, Dong Liu, Sam Urmian, Mohammad Khalil, and Pedro P. Vergara Barrios

TL;DR
This paper explores how adversaries with access to real data or control over data generation can intentionally degrade the quality of synthetic data, revealing vulnerabilities in current SDG methods.
Contribution
It formalizes a threat model for quality degradation attacks and empirically demonstrates their impact on synthetic data utility and statistical properties.
Findings
Small perturbations can significantly reduce predictive performance.
Targeted manipulations increase statistical divergence.
Vulnerabilities exist in current synthetic data generation pipelines.
Abstract
Synthetic Data Generation (SDG) can be used to facilitate privacy-preserving data sharing. However, most existing research focuses on privacy attacks where the adversary is the recipient of the released synthetic data and attempts to infer sensitive information from it. This study investigates quality degradation attacks initiated by adversaries who possess access to the real dataset or control over the generation process, such as the data owner, the synthetic data provider, or potential intruders. We formalize a corresponding threat model and empirically evaluate the effectiveness of targeted manipulations of real data (e.g., label flipping and feature-importance-based interventions) on the quality of generated synthetic data. The results show that even small perturbations can substantially reduce downstream predictive performance and increase statistical divergence, exposing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Blockchain Technology Applications and Security
