Reconstruction of Differentially Private Text Sanitization via Large Language Models

Shuchao Pang; Zhigang Lu; Haichen Wang; Peng Fu; Yongbin Zhou; Minhui Xue

arXiv:2410.12443·cs.CR·September 19, 2025

Reconstruction of Differentially Private Text Sanitization via Large Language Models

Shuchao Pang, Zhigang Lu, Haichen Wang, Peng Fu, Yongbin Zhou, Minhui Xue

PDF

Open Access

TL;DR

This paper reveals that large language models can reconstruct private information from differentially private text sanitization, posing new security risks and demonstrating high recovery rates through novel black-box and white-box attacks.

Contribution

The study introduces two novel attack methods on DP-sanitized text using LLMs, exposing vulnerabilities and highlighting security concerns in current privacy-preserving techniques.

Findings

01

High recovery rates of private data from DP-sanitized text (up to 94%)

02

Effective black-box and white-box attack strategies demonstrated across multiple LLMs

03

LLMs pose new security risks to existing DP text sanitization methods

Abstract

Differential privacy (DP) is the de facto privacy standard against privacy leakage attacks, including many recently discovered ones against large language models (LLMs). However, we discovered that LLMs could reconstruct the altered/removed privacy from given DP-sanitized prompts. We propose two attacks (black-box and white-box) based on the accessibility to LLMs and show that LLMs could connect the pair of DP-sanitized text and the corresponding private training data of LLMs by giving sample text pairs as instructions (in the black-box attacks) or fine-tuning data (in the white-box attacks). To illustrate our findings, we conduct comprehensive experiments on modern LLMs (e.g., LLaMA-2, LLaMA-3, ChatGPT-3.5, ChatGPT-4, ChatGPT-4o, Claude-3, Claude-3.5, OPT, GPT-Neo, GPT-J, Gemma-2, and Pythia) using commonly used datasets (such as WikiMIA, Pile-CC, and Pile-Wiki) against both word-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data

MethodsGPT-Neo · OPT