Synthetic Dataset Generation for Privacy-Preserving Machine Learning
Efstathia Soufleri, Gobinda Saha, Kaushik Roy

TL;DR
This paper introduces a novel method to generate synthetic datasets that preserve privacy by matching batch normalization statistics, enabling training of neural networks without exposing sensitive original data.
Contribution
The proposed approach creates privacy-preserving synthetic datasets using BN statistics and pre-trained models, avoiding complex encryption or obfuscation techniques.
Findings
Synthetic data achieves comparable classification accuracy to original data.
Method effectively balances privacy and utility in image classification.
Synthetic datasets can be used for training from scratch with reasonable performance.
Abstract
Machine Learning (ML) has achieved enormous success in solving a variety of problems in computer vision, speech recognition, object detection, to name a few. The principal reason for this success is the availability of huge datasets for training deep neural networks (DNNs). However, datasets can not be publicly released if they contain sensitive information such as medical or financial records. In such cases, data privacy becomes a major concern. Encryption methods offer a possible solution to this issue, however their deployment on ML applications is non-trivial, as they seriously impact the classification accuracy and result in substantial computational overhead.Alternatively, obfuscation techniques can be used, but maintaining a good balance between visual privacy and accuracy is challenging. In this work, we propose a method to generate secure synthetic datasets from the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Biometric Identification and Security
MethodsBatch Normalization
