User-Entity Differential Privacy in Learning Natural Language Models
Phung Lai, NhatHai Phan, Tong Sun, Rajiv Jain, Franck Dernoncourt,, Jiuxiang Gu, Nikolaos Barmpalios

TL;DR
This paper proposes user-entity differential privacy (UeDP) to protect sensitive information in textual data and data owners during natural language model training, introducing a novel algorithm that balances privacy and utility.
Contribution
The paper introduces UeDP, a new privacy framework, and develops UeDP-Alg, an algorithm that effectively balances privacy guarantees with model utility in NLP tasks.
Findings
UeDP-Alg outperforms baseline methods in utility under the same privacy budget.
Theoretical analysis confirms the tight sensitivity bounds of UeDP-Alg.
Empirical results on benchmark datasets demonstrate improved privacy-utility trade-offs.
Abstract
In this paper, we introduce a novel concept of user-entity differential privacy (UeDP) to provide formal privacy protection simultaneously to both sensitive entities in textual data and data owners in learning natural language models (NLMs). To preserve UeDP, we developed a novel algorithm, called UeDP-Alg, optimizing the trade-off between privacy loss and model utility with a tight sensitivity bound derived from seamlessly combining user and sensitive entity sampling processes. An extensive theoretical analysis and evaluation show that our UeDP-Alg outperforms baseline approaches in model utility under the same privacy budget consumption on several NLM tasks, using benchmark datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
