TextHide: Tackling Data Privacy in Language Understanding Tasks

Yangsibo Huang; Zhao Song; Danqi Chen; Kai Li; Sanjeev Arora

arXiv:2010.06053·cs.CL·October 14, 2020·6 cites

TextHide: Tackling Data Privacy in Language Understanding Tasks

Yangsibo Huang, Zhao Song, Danqi Chen, Kai Li, Sanjeev Arora

PDF

Open Access 1 Repo

TL;DR

TextHide introduces an efficient encryption method for privacy-preserving natural language understanding that minimally impacts model accuracy and effectively defends against gradient and representation attacks.

Contribution

It proposes a simple encryption step compatible with fine-tuning pre-trained language models to enhance privacy in distributed NLP tasks.

Findings

01

Effective privacy defense against gradient and representation attacks.

02

Minimal accuracy reduction of only 1.9% on GLUE benchmark.

03

Encryption adds negligible overhead to training.

Abstract

An unsolved challenge in distributed or federated learning is to effectively mitigate privacy risks without slowing down training or reducing accuracy. In this paper, we propose TextHide aiming at addressing this challenge for natural language understanding tasks. It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data. Such an encryption step is efficient and only affects the task performance slightly. In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e.g., BERT) for any sentence or sentence-pair task. We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations and the averaged accuracy reduction is only $1.9%$ . We also present an analysis of the security of TextHide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Hazelsuko07/TextHide
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Topic Modeling