Lossy Compression of Noisy Data for Private and Data-Efficient Learning
Berivan Isik, Tsachy Weissman

TL;DR
This paper introduces a method combining noise injection and lossy compression to reduce storage and privacy risks in data-driven learning, maintaining utility and enhancing robustness.
Contribution
It presents a novel framework that matches lossy compression to noise distribution, ensuring data utility while improving privacy and storage efficiency.
Findings
Compressed data converges to noise-free data distribution as sample size increases
Method reduces storage and privacy leakage without sacrificing classification accuracy
Enhances robustness against adversarial test data
Abstract
Storage-efficient privacy-preserving learning is crucial due to increasing amounts of sensitive user data required for modern learning tasks. We propose a framework for reducing the storage cost of user data while at the same time providing privacy guarantees, without essential loss in the utility of the data for learning. Our method comprises noise injection followed by lossy compression. We show that, when appropriately matching the lossy compression to the distribution of the added noise, the compressed examples converge, in distribution, to that of the noise-free training data as the sample size of the training data (or the dimension of the training data) increases. In this sense, the utility of the data for learning is essentially maintained, while reducing storage and privacy leakage by quantifiable amounts. We present experimental results on the CelebA dataset for gender…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Cryptography and Data Security
