Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control
T\^ania Carvalho, Nuno Moniz, Lu\'is Antunes, Nitesh Chawla

TL;DR
This paper introduces $\ extepsilon$-PrivateSMOTE, a new differentially private synthetic data generation method that effectively reduces re-identification risk while maintaining high data utility and computational efficiency.
Contribution
The paper presents $\ extepsilon$-PrivateSMOTE, a novel synthetic data generation technique combining noise-induced interpolation with differential privacy, offering a resource-efficient alternative to existing methods.
Findings
Achieves competitive privacy risk reduction compared to state-of-the-art methods.
Improves computational efficiency by at least a factor of 9.
Maintains high predictive performance with lower resource requirements.
Abstract
Protecting user data privacy can be achieved via many methods, from statistical transformations to generative models. However, all of them have critical drawbacks. For example, creating a transformed data set using traditional techniques is highly time-consuming. Also, recent deep learning-based solutions require significant computational resources in addition to long training phases, and differentially private-based solutions may undermine data utility. In this paper, we propose -PrivateSMOTE, a technique designed for safeguarding against re-identification and linkage attacks, particularly addressing cases with a high \sloppy re-identification risk. Our proposal combines synthetic data generation via noise-induced interpolation with differential privacy principles to obfuscate high-risk cases. We demonstrate how -PrivateSMOTE is capable of achieving competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning
