Lossy Compression of Noisy Data for Private and Data-Efficient Learning

Berivan Isik; Tsachy Weissman

arXiv:2202.02892·cs.IT·March 23, 2023

Lossy Compression of Noisy Data for Private and Data-Efficient Learning

Berivan Isik, Tsachy Weissman

PDF

Open Access

TL;DR

This paper introduces a method combining noise injection and lossy compression to reduce storage and privacy risks in data-driven learning, maintaining utility and enhancing robustness.

Contribution

It presents a novel framework that matches lossy compression to noise distribution, ensuring data utility while improving privacy and storage efficiency.

Findings

01

Compressed data converges to noise-free data distribution as sample size increases

02

Method reduces storage and privacy leakage without sacrificing classification accuracy

03

Enhances robustness against adversarial test data

Abstract

Storage-efficient privacy-preserving learning is crucial due to increasing amounts of sensitive user data required for modern learning tasks. We propose a framework for reducing the storage cost of user data while at the same time providing privacy guarantees, without essential loss in the utility of the data for learning. Our method comprises noise injection followed by lossy compression. We show that, when appropriately matching the lossy compression to the distribution of the added noise, the compressed examples converge, in distribution, to that of the noise-free training data as the sample size of the training data (or the dimension of the training data) increases. In this sense, the utility of the data for learning is essentially maintained, while reducing storage and privacy leakage by quantifiable amounts. We present experimental results on the CelebA dataset for gender…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Cryptography and Data Security