Differentially-Private Data Synthetisation for Efficient   Re-Identification Risk Control

T\^ania Carvalho; Nuno Moniz; Lu\'is Antunes; Nitesh Chawla

arXiv:2212.00484·cs.LG·April 24, 2024

Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control

T\^ania Carvalho, Nuno Moniz, Lu\'is Antunes, Nitesh Chawla

PDF

Open Access 1 Repo

TL;DR

This paper introduces $\ extepsilon$-PrivateSMOTE, a new differentially private synthetic data generation method that effectively reduces re-identification risk while maintaining high data utility and computational efficiency.

Contribution

The paper presents $\ extepsilon$-PrivateSMOTE, a novel synthetic data generation technique combining noise-induced interpolation with differential privacy, offering a resource-efficient alternative to existing methods.

Findings

01

Achieves competitive privacy risk reduction compared to state-of-the-art methods.

02

Improves computational efficiency by at least a factor of 9.

03

Maintains high predictive performance with lower resource requirements.

Abstract

Protecting user data privacy can be achieved via many methods, from statistical transformations to generative models. However, all of them have critical drawbacks. For example, creating a transformed data set using traditional techniques is highly time-consuming. Also, recent deep learning-based solutions require significant computational resources in addition to long training phases, and differentially private-based solutions may undermine data utility. In this paper, we propose $ϵ$ -PrivateSMOTE, a technique designed for safeguarding against re-identification and linkage attacks, particularly addressing cases with a high \sloppy re-identification risk. Our proposal combines synthetic data generation via noise-induced interpolation with differential privacy principles to obfuscate high-risk cases. We demonstrate how $ϵ$ -PrivateSMOTE is capable of achieving competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tmcarvalho/privatesmote
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning