Selective Differential Privacy for Language Modeling

Weiyan Shi; Aiqi Cui; Evan Li; Ruoxi Jia; Zhou Yu

arXiv:2108.12944·cs.CL·July 19, 2022

Selective Differential Privacy for Language Modeling

Weiyan Shi, Aiqi Cui, Evan Li, Ruoxi Jia, Zhou Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces selective differential privacy for language models, focusing privacy guarantees on sensitive data parts to enhance utility while maintaining privacy, demonstrated through experiments on language modeling and dialog systems.

Contribution

It proposes a new privacy notion, selective differential privacy, and develops a corresponding mechanism, Selective-DPSGD, to improve privacy-utility trade-offs in language models.

Findings

01

Better utility achieved compared to baseline methods.

02

Maintains privacy under various attack scenarios.

03

Effective in both language modeling and dialog systems.

Abstract

With the increasing applications of language models, it has become crucial to protect these models from leaking private information. Previous work has attempted to tackle this challenge by training RNN-based language models with differential privacy guarantees. However, applying classical differential privacy to language models leads to poor model performance as the underlying privacy notion is over-pessimistic and provides undifferentiated protection for all tokens in the data. Given that the private information in natural language is sparse (for example, the bulk of an email might not carry personally identifiable information), we propose a new privacy notion, selective differential privacy, to provide rigorous privacy guarantees on the sensitive portion of the data to improve model utility. To realize such a new notion, we develop a corresponding privacy mechanism, Selective-DPSGD,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wyshi/lm_privacy
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Access Control and Trust