SMOTE-DP: Improving Privacy-Utility Tradeoff with Synthetic Data
Yan Zhou, Bradley Malin, Murat Kantarcioglu

TL;DR
This paper introduces SMOTE-DP, a method combining synthetic minority over-sampling with differential privacy to enhance privacy protection while maintaining data utility in synthetic data sharing.
Contribution
The paper proposes a novel SMOTE-DP technique that integrates SMOTE with differential privacy, improving privacy-utility trade-off in synthetic data generation.
Findings
SMOTE-DP achieves strong privacy guarantees.
Synthetic data maintains utility in downstream tasks.
Theoretical and empirical validation supports effectiveness.
Abstract
Privacy-preserving data publication, including synthetic data sharing, often experiences trade-offs between privacy and utility. Synthetic data is generally more effective than data anonymization in balancing this trade-off, however, not without its own challenges. Synthetic data produced by generative models trained on source data may inadvertently reveal information about outliers. Techniques specifically designed for preserving privacy, such as introducing noise to satisfy differential privacy, often incur unpredictable and significant losses in utility. In this work we show that, with the right mechanism of synthetic data generation, we can achieve strong privacy protection without significant utility loss. Synthetic data generators producing contracting data patterns, such as Synthetic Minority Over-sampling Technique (SMOTE), can enhance a differentially private data generator,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Data Quality and Management
