User-Entity Differential Privacy in Learning Natural Language Models

Phung Lai; NhatHai Phan; Tong Sun; Rajiv Jain; Franck Dernoncourt,; Jiuxiang Gu; Nikolaos Barmpalios

arXiv:2211.01141·cs.CR·November 10, 2022

User-Entity Differential Privacy in Learning Natural Language Models

Phung Lai, NhatHai Phan, Tong Sun, Rajiv Jain, Franck Dernoncourt,, Jiuxiang Gu, Nikolaos Barmpalios

PDF

Open Access 1 Repo

TL;DR

This paper proposes user-entity differential privacy (UeDP) to protect sensitive information in textual data and data owners during natural language model training, introducing a novel algorithm that balances privacy and utility.

Contribution

The paper introduces UeDP, a new privacy framework, and develops UeDP-Alg, an algorithm that effectively balances privacy guarantees with model utility in NLP tasks.

Findings

01

UeDP-Alg outperforms baseline methods in utility under the same privacy budget.

02

Theoretical analysis confirms the tight sensitivity bounds of UeDP-Alg.

03

Empirical results on benchmark datasets demonstrate improved privacy-utility trade-offs.

Abstract

In this paper, we introduce a novel concept of user-entity differential privacy (UeDP) to provide formal privacy protection simultaneously to both sensitive entities in textual data and data owners in learning natural language models (NLMs). To preserve UeDP, we developed a novel algorithm, called UeDP-Alg, optimizing the trade-off between privacy loss and model utility with a tight sensitivity bound derived from seamlessly combining user and sensitive entity sampling processes. An extensive theoretical analysis and evaluation show that our UeDP-Alg outperforms baseline approaches in model utility under the same privacy budget consumption on several NLM tasks, using benchmark datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

phunglai728/uedp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data