Generative Models for Effective ML on Private, Decentralized Datasets
Sean Augenstein, H. Brendan McMahan, Daniel Ramage, Swaroop Ramaswamy,, Peter Kairouz, Mingqing Chen, Rajiv Mathews, Blaise Aguera y Arcas

TL;DR
This paper introduces differentially private federated generative models that enable effective data debugging and analysis in privacy-sensitive and decentralized settings, where raw data cannot be directly accessed.
Contribution
It presents novel federated generative modeling techniques with formal privacy guarantees for debugging and analyzing private, decentralized datasets.
Findings
Generative models can identify data issues without direct data access.
Federated RNNs and GANs with differential privacy are effective for data debugging.
Methods work on both text and image datasets.
Abstract
To improve real-world applications of machine learning, experienced modelers develop intuition about their datasets, their models, and how the two interact. Manual inspection of raw data - of representative samples, of outliers, of misclassifications - is an essential tool in a) identifying and fixing problems in the data, b) generating new modeling hypotheses, and c) assigning or refining human-provided labels. However, manual data inspection is problematic for privacy sensitive datasets, such as those representing the behavior of real-world individuals. Furthermore, manual data inspection is impossible in the increasingly important setting of federated learning, where raw examples are stored at the edge and the modeler may only access aggregated outputs such as metrics or model parameters. This paper demonstrates that generative models - trained using federated methods and with formal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
