MC-GEN:Multi-level Clustering for Private Synthetic Data Generation
Mingchen Li, Di Zhuang, and J. Morris Chang

TL;DR
MC-GEN is a novel multi-level clustering approach that generates private synthetic datasets with differential privacy guarantees, improving utility for machine learning classification tasks compared to existing methods.
Contribution
The paper introduces MC-GEN, a new differential privacy-based synthetic data generation method utilizing multi-level clustering to enhance data utility.
Findings
MC-GEN achieves high utility under privacy constraints.
It outperforms three existing synthetic data generation methods.
Effective across multiple classification tasks.
Abstract
With the development of machine learning and data science, data sharing is very common between companies and research institutes to avoid data scarcity. However, sharing original datasets that contain private information can cause privacy leakage. A reliable solution is to utilize private synthetic datasets which preserve statistical information from original datasets. In this paper, we propose MC-GEN, a privacy-preserving synthetic data generation method under differential privacy guarantee for machine learning classification tasks. MC-GEN applies multi-level clustering and differential private generative model to improve the utility of synthetic data. In the experimental evaluation, we evaluated the effects of parameters and the effectiveness of MC-GEN. The results showed that MC-GEN can achieve significant effectiveness under certain privacy guarantees on multiple classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management
