DP-MLM: Differentially Private Text Rewriting Using Masked Language   Models

Stephen Meisenbacher; Maulik Chevli; Juraj Vladika; and Florian; Matthes

arXiv:2407.00637·cs.CL·July 2, 2024

DP-MLM: Differentially Private Text Rewriting Using Masked Language Models

Stephen Meisenbacher, Maulik Chevli, Juraj Vladika, and Florian, Matthes

PDF

Open Access 1 Repo

TL;DR

This paper introduces DP-MLM, a novel differentially private text rewriting method using masked language models, which improves privacy-utility trade-offs and offers greater customization over previous autoregressive approaches.

Contribution

The paper proposes DP-MLM, a new privacy-preserving text rewriting technique leveraging masked language models for better utility and flexibility.

Findings

01

MLMs outperform autoregressive models at low privacy levels

02

DP-MLM provides improved utility preservation

03

The method allows greater customization of rewriting mechanisms

Abstract

The task of text privatization using Differential Privacy has recently taken the form of $text rewriting$ , in which an input text is obfuscated via the use of generative (large) language models. While these methods have shown promising results in the ability to preserve privacy, these methods rely on autoregressive models which lack a mechanism to contextualize the private rewriting process. In response to this, we propose $DP-MLM$ , a new method for differentially private text rewriting based on leveraging masked language models (MLMs) to rewrite text in a semantically similar $and$ obfuscated manner. We accomplish this with a simple contextualization technique, whereby we rewrite a text one token at a time. We find that utilizing encoder-only MLMs provides better utility preservation at lower $ε$ levels, as compared to previous methods relying on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sjmeis/dpmlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques