LAMP: Extracting Text from Gradients with Language Model Priors

Mislav Balunovi\'c; Dimitar I. Dimitrov; Nikola Jovanovi\'c; Martin; Vechev

arXiv:2202.08827·cs.LG·October 20, 2022·21 cites

LAMP: Extracting Text from Gradients with Language Model Priors

Mislav Balunovi\'c, Dimitar I. Dimitrov, Nikola Jovanovi\'c, Martin, Vechev

PDF

Open Access 2 Repos 1 Video

TL;DR

LAMP is a novel attack method that effectively reconstructs original text from gradient updates in federated learning by leveraging language model priors and combined optimization techniques, revealing significant privacy risks.

Contribution

The paper introduces LAMP, the first attack to successfully recover text from gradients, using language priors and mixed optimization, extending privacy analysis to textual data.

Findings

01

LAMP reconstructs 5x more bigrams than prior methods.

02

It achieves 23% longer subsequence recovery.

03

First to recover inputs from batch sizes larger than 1.

Abstract

Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains such as text. In this work, we propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients. Our attack is based on two key insights: (i) modeling prior text probability with an auxiliary language model, guiding the search towards more natural text, and (ii) alternating continuous and discrete optimization, which minimizes reconstruction loss on embeddings, while avoiding local minima by applying discrete text transformations. Our experiments demonstrate that LAMP is significantly more effective than prior work: it reconstructs 5x more bigrams and 23% longer subsequences on average.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

LAMP: Extracting Text from Gradients with Language Model Priors· slideslive

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning