Context-Aware Differential Privacy for Language Modeling

My H. Dinh; Ferdinando Fioretto

arXiv:2301.12288·cs.LG·January 31, 2023

Context-Aware Differential Privacy for Language Modeling

My H. Dinh, Ferdinando Fioretto

PDF

Open Access

TL;DR

This paper presents CADP-LM, a language model framework that uses context-aware differential privacy to selectively protect sensitive sentences and contexts, enhancing privacy without sacrificing accuracy.

Contribution

It introduces a novel context-aware differential privacy approach for language models, enabling targeted privacy protection of sensitive information.

Findings

01

Effective protection of sensitive sentences demonstrated

02

High accuracy maintained while ensuring privacy

03

Versatile across multiple datasets and settings

Abstract

The remarkable ability of language models (LMs) has also brought challenges at the interface of AI and security. A critical challenge pertains to how much information these models retain and leak about the training data. This is particularly urgent as the typical development of LMs relies on huge, often highly sensitive data, such as emails and chat logs. To contrast this shortcoming, this paper introduces Context-Aware Differentially Private Language Model (CADP-LM) , a privacy-preserving LM framework that relies on two key insights: First, it utilizes the notion of \emph{context} to define and audit the potentially sensitive information. Second, it adopts the notion of Differential Privacy to protect sensitive information and characterize the privacy leakage. A unique characteristic of CADP-LM is its ability to target the protection of sensitive sentences and contexts only, providing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Topic Modeling