Privacy Guarantees for De-identifying Text Transformations

David Ifeoluwa Adelani; Ali Davody; Thomas Kleinbauer; and Dietrich; Klakow

arXiv:2008.03101·cs.CL·November 16, 2022

Privacy Guarantees for De-identifying Text Transformations

David Ifeoluwa Adelani, Ali Davody, Thomas Kleinbauer, and Dietrich, Klakow

PDF

1 Repo

TL;DR

This paper establishes formal privacy guarantees for text de-identification methods using differential privacy and compares their impact on machine learning task performance, highlighting the robustness of word-by-word replacement strategies.

Contribution

It introduces formal differential privacy guarantees for text transformations and evaluates their utility in NLP tasks, comparing simple redaction and deep learning-based replacements.

Findings

01

Word-by-word replacement maintains task performance better.

02

Differential privacy guarantees can be formalized for text de-identification.

03

Sophisticated replacement methods outperform redaction in privacy-utility trade-offs.

Abstract

Machine Learning approaches to Natural Language Processing tasks benefit from a comprehensive collection of real-life user data. At the same time, there is a clear need for protecting the privacy of the users whose data is collected and processed. For text collections, such as, e.g., transcripts of voice interactions or patient records, replacing sensitive parts with benign alternatives can provide de-identification. However, how much privacy is actually guaranteed by such text transformations, and are the resulting texts still useful for machine learning? In this paper, we derive formal privacy guarantees for general text transformation-based de-identification methods on the basis of Differential Privacy. We also measure the effect that different ways of masking private information in dialog transcripts have on a subsequent machine learning task. To this end, we formulate different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uds-lsv/privacy-preserving-text-transformer
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.