Understanding Unintended Memorization in Federated Learning

Om Thakkar; Swaroop Ramaswamy; Rajiv Mathews; Fran\c{c}oise Beaufays

arXiv:2006.07490·cs.LG·June 16, 2020·20 cites

Understanding Unintended Memorization in Federated Learning

Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Fran\c{c}oise Beaufays

PDF

Open Access

TL;DR

This paper investigates how federated learning components influence unintended memorization in models, finding that data clustering, federated averaging, and differential privacy significantly reduce memorization of sensitive data.

Contribution

It provides a formal analysis of how federated learning components impact unintended memorization, highlighting the roles of data clustering, federated averaging, and differential privacy.

Findings

01

Data clustering reduces memorization.

02

Federated averaging further decreases memorization.

03

Differential privacy leads to models with minimal memorization.

Abstract

Recent works have shown that generative sequence models (e.g., language models) have a tendency to memorize rare or unique sequences in the training data. Since useful models are often trained on sensitive data, to ensure the privacy of the training data it is critical to identify and mitigate such unintended memorization. Federated Learning (FL) has emerged as a novel framework for large-scale distributed learning tasks. However, it differs in many aspects from the well-studied central learning setting where all the data is stored at the central server. In this paper, we initiate a formal study to understand the effect of different components of canonical FL on unintended memorization in trained models, comparing with the central learning setting. Our results show that several differing components of FL play an important role in reducing unintended memorization. Specifically, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Stochastic Gradient Optimization Techniques