Generative Models for Effective ML on Private, Decentralized Datasets

Sean Augenstein; H. Brendan McMahan; Daniel Ramage; Swaroop Ramaswamy,; Peter Kairouz; Mingqing Chen; Rajiv Mathews; Blaise Aguera y Arcas

arXiv:1911.06679·cs.LG·February 6, 2020·43 cites

Generative Models for Effective ML on Private, Decentralized Datasets

Sean Augenstein, H. Brendan McMahan, Daniel Ramage, Swaroop Ramaswamy,, Peter Kairouz, Mingqing Chen, Rajiv Mathews, Blaise Aguera y Arcas

PDF

Open Access 3 Repos

TL;DR

This paper introduces differentially private federated generative models that enable effective data debugging and analysis in privacy-sensitive and decentralized settings, where raw data cannot be directly accessed.

Contribution

It presents novel federated generative modeling techniques with formal privacy guarantees for debugging and analyzing private, decentralized datasets.

Findings

01

Generative models can identify data issues without direct data access.

02

Federated RNNs and GANs with differential privacy are effective for data debugging.

03

Methods work on both text and image datasets.

Abstract

To improve real-world applications of machine learning, experienced modelers develop intuition about their datasets, their models, and how the two interact. Manual inspection of raw data - of representative samples, of outliers, of misclassifications - is an essential tool in a) identifying and fixing problems in the data, b) generating new modeling hypotheses, and c) assigning or refining human-provided labels. However, manual data inspection is problematic for privacy sensitive datasets, such as those representing the behavior of real-world individuals. Furthermore, manual data inspection is impossible in the increasingly important setting of federated learning, where raw examples are stored at the edge and the modeler may only access aggregated outputs such as metrics or model parameters. This paper demonstrates that generative models - trained using federated methods and with formal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning