On the Vulnerability of Text Sanitization

Meng Tong; Kejiang Chen; Xiaojian Yuan; Jiayang Liu; Weiming Zhang,; Nenghai Yu; Jie Zhang

arXiv:2410.17052·cs.CR·May 6, 2025

On the Vulnerability of Text Sanitization

Meng Tong, Kejiang Chen, Xiaojian Yuan, Jiayang Liu, Weiming Zhang,, Nenghai Yu, Jie Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper critically evaluates the privacy protection effectiveness of text sanitization by developing theoretically optimal and practical reconstruction attacks, revealing significant vulnerabilities and prompting a reassessment of current sanitization methods.

Contribution

It introduces theoretically grounded and practical reconstruction attacks that outperform existing methods, providing a more accurate assessment of text sanitization privacy risks.

Findings

01

One attack improved attack success rate by 46.4% over baseline.

02

Revealed significant vulnerabilities in current text sanitization methods.

03

Provided bounds on attack success rate as benchmarks for evaluation.

Abstract

Text sanitization, which employs differential privacy to replace sensitive tokens with new ones, represents a significant technique for privacy protection. Typically, its performance in preserving privacy is evaluated by measuring the attack success rate (ASR) of reconstruction attacks, where attackers attempt to recover the original tokens from the sanitized ones. However, current reconstruction attacks on text sanitization are developed empirically, making it challenging to accurately assess the effectiveness of sanitization. In this paper, we aim to provide a more accurate evaluation of sanitization effectiveness. Inspired by the works of Palamidessi et al., we implement theoretically optimal reconstruction attacks targeting text sanitization. We derive their bounds on ASR as benchmarks for evaluating sanitization performance. For real-world applications, we propose two practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mengtong0110/on-the-vulnerability-of-text-sanitization
pytorchOfficial

Videos

On the Vulnerability of Text Sanitization· underline

Taxonomy

TopicsDigital and Cyber Forensics