Context-Aware Differential Privacy for Language Modeling
My H. Dinh, Ferdinando Fioretto

TL;DR
This paper presents CADP-LM, a language model framework that uses context-aware differential privacy to selectively protect sensitive sentences and contexts, enhancing privacy without sacrificing accuracy.
Contribution
It introduces a novel context-aware differential privacy approach for language models, enabling targeted privacy protection of sensitive information.
Findings
Effective protection of sensitive sentences demonstrated
High accuracy maintained while ensuring privacy
Versatile across multiple datasets and settings
Abstract
The remarkable ability of language models (LMs) has also brought challenges at the interface of AI and security. A critical challenge pertains to how much information these models retain and leak about the training data. This is particularly urgent as the typical development of LMs relies on huge, often highly sensitive data, such as emails and chat logs. To contrast this shortcoming, this paper introduces Context-Aware Differentially Private Language Model (CADP-LM) , a privacy-preserving LM framework that relies on two key insights: First, it utilizes the notion of \emph{context} to define and audit the potentially sensitive information. Second, it adopts the notion of Differential Privacy to protect sensitive information and characterize the privacy leakage. A unique characteristic of CADP-LM is its ability to target the protection of sensitive sentences and contexts only, providing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Topic Modeling
