Recovering Private Text in Federated Learning of Language Models

Samyak Gupta; Yangsibo Huang; Zexuan Zhong; Tianyu Gao; Kai Li; Danqi; Chen

arXiv:2205.08514·cs.CL·October 19, 2022·22 cites

Recovering Private Text in Federated Learning of Language Models

Samyak Gupta, Yangsibo Huang, Zexuan Zhong, Tianyu Gao, Kai Li, Danqi, Chen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces FILM, a novel attack method demonstrating the feasibility of recovering private text data from federated learning of language models, highlighting privacy vulnerabilities and proposing effective defenses.

Contribution

The paper presents the first successful method for recovering text from large batch sizes in federated learning and evaluates defense strategies to mitigate this privacy risk.

Findings

01

High-fidelity text recovery from large batch sizes

02

Effective defense via fine-tuning pre-trained models without updating embeddings

03

Gradient pruning and DPSGD reduce attack success but impact utility

Abstract

Federated learning allows distributed users to collaboratively train a model while keeping each user's data private. Recently, a growing body of work has demonstrated that an eavesdropping attacker can effectively recover image data from gradients transmitted during federated learning. However, little progress has been made in recovering text data. In this paper, we present a novel attack method FILM for federated learning of language models (LMs). For the first time, we show the feasibility of recovering text from large batch sizes of up to 128 sentences. Unlike image-recovery methods that are optimized to match gradients, we take a distinct approach that first identifies a set of words from gradients and then directly reconstructs sentences based on beam search and a prior-based reordering strategy. We conduct the FILM attack on several large-scale datasets and show that it can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

princeton-sysml/film
pytorchOfficial

Videos

Recovering Private Text in Federated Learning of Language Models· slideslive

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning

MethodsPruning