A Customized Text Sanitization Mechanism with Differential Privacy

Huimin Chen; Fengran Mo; Yanhao Wang; Cen Chen; Jian-Yun Nie; Chengyu; Wang; Jamie Cui

arXiv:2207.01193·cs.CR·September 4, 2023·1 cites

A Customized Text Sanitization Mechanism with Differential Privacy

Huimin Chen, Fengran Mo, Yanhao Wang, Cen Chen, Jian-Yun Nie, Chengyu, Wang, Jamie Cui

PDF

Open Access 1 Repo

TL;DR

This paper introduces CusText, a novel text sanitization method based on epsilon-differential privacy that is compatible with any similarity measure and offers improved privacy-utility trade-offs in NLP tasks.

Contribution

CusText is a new text sanitization mechanism that overcomes limitations of existing methods by supporting any similarity measure and providing token-level privacy customization.

Findings

01

CusText outperforms existing mechanisms in privacy-utility trade-offs.

02

It is compatible with any similarity measure.

03

Extensive experiments validate its effectiveness.

Abstract

As privacy issues are receiving increasing attention within the Natural Language Processing (NLP) community, numerous methods have been proposed to sanitize texts subject to differential privacy. However, the state-of-the-art text sanitization mechanisms based on metric local differential privacy (MLDP) do not apply to non-metric semantic similarity measures and cannot achieve good trade-offs between privacy and utility. To address the above limitations, we propose a novel Customized Text (CusText) sanitization mechanism based on the original $ϵ$ -differential privacy (DP) definition, which is compatible with any similarity measure. Furthermore, CusText assigns each input token a customized output set of tokens to provide more advanced privacy protection at the token level. Extensive experiments on several benchmark datasets show that CusText achieves a better trade-off between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sai4july/custext
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data