Sentence-level Privacy for Document Embeddings

Casey Meehan; Khalil Mrini; Kamalika Chaudhuri

arXiv:2205.04605·cs.LG·May 11, 2022·1 cites

Sentence-level Privacy for Document Embeddings

Casey Meehan, Khalil Mrini, Kamalika Chaudhuri

PDF

Open Access

TL;DR

This paper introduces SentDP, a method for sentence-level local differential privacy in document embeddings, ensuring privacy at the sentence level while maintaining utility for tasks like sentiment analysis.

Contribution

The paper presents a novel technique, DeepCandidate, combining robust statistics and language modeling to achieve high-dimensional, sentence-level differential privacy in document embeddings.

Findings

01

SentDP provides strong privacy guarantees at the sentence level.

02

Private embeddings are effective for downstream NLP tasks.

03

SentDP outperforms baseline methods with weaker privacy guarantees.

Abstract

User language data can contain highly sensitive personal content. As such, it is imperative to offer users a strong and interpretable privacy guarantee when learning from their data. In this work, we propose SentDP: pure local differential privacy at the sentence level for a single user document. We propose a novel technique, DeepCandidate, that combines concepts from robust statistics and language modeling to produce high-dimensional, general-purpose $ϵ$ -SentDP document embeddings. This guarantees that any single sentence in a document can be substituted with any other sentence while keeping the embedding $ϵ$ -indistinguishable. Our experiments indicate that these private document embeddings are useful for downstream tasks like sentiment analysis and topic classification and even outperform baseline methods with weaker guarantees like word-level Metric DP.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Hate Speech and Cyberbullying Detection · Privacy, Security, and Data Protection