Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving   Training Data Release for Machine Learning

Tamas Madl; Weijie Xu; Olivia Choudhury; Matthew Howard

arXiv:2307.01875·cs.LG·July 6, 2023·2 cites

Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving Training Data Release for Machine Learning

Tamas Madl, Weijie Xu, Olivia Choudhury, Matthew Howard

PDF

Open Access

TL;DR

The paper introduces the 3A framework for privacy-preserving data release that maximizes machine learning utility while ensuring differential privacy, demonstrated through mixture models and Gaussian privacy with improved model performance.

Contribution

It proposes a novel 3A framework combining approximation, adaptation, and anonymization to enhance utility in privacy-preserving data release for machine learning.

Findings

01

Minimal performance discrepancy between models trained on real and privatized data.

02

Significant improvements over existing privacy-preserving synthetic data methods.

03

Framework effectively balances privacy and utility in sensitive data sharing.

Abstract

The availability of large amounts of informative data is crucial for successful machine learning. However, in domains with sensitive information, the release of high-utility data which protects the privacy of individuals has proven challenging. Despite progress in differential privacy and generative modeling for privacy-preserving data release in the literature, only a few approaches optimize for machine learning utility: most approaches only take into account statistical metrics on the data itself and fail to explicitly preserve the loss metrics of machine learning models that are to be subsequently trained on the generated data. In this paper, we introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning, while preserving differential privacy. We also describe a specific implementation of this framework that leverages mixture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques

Methodsfail