Truthful Text Sanitization Guided by Inference Attacks

Ildik\'o Pil\'an; Benet Manzanares-Salor; David S\'anchez; Pierre Lison

arXiv:2412.12928·cs.CL·September 3, 2025

Truthful Text Sanitization Guided by Inference Attacks

Ildik\'o Pil\'an, Benet Manzanares-Salor, David S\'anchez, Pierre Lison

PDF

Open Access

TL;DR

This paper introduces a novel text sanitization method using instruction-tuned large language models to balance privacy and utility by generating and selecting informative, privacy-resistant replacements for personal information in documents.

Contribution

The approach leverages a two-stage process with LLMs to generate and evaluate replacements, introducing new metrics for privacy and utility without manual annotation.

Findings

01

Enhanced utility with minimal re-identification risk

02

More truth-preserving than existing methods

03

Effective balance between privacy and utility

Abstract

Text sanitization aims to rewrite parts of a document to prevent disclosure of personal information. The central challenge of text sanitization is to strike a balance between privacy protection (avoiding the leakage of personal information) and utility preservation (retaining as much as possible of the document's original content). To this end, we introduce a novel text sanitization method based on generalizations, that is, broader but still informative terms that subsume the semantic content of the original text spans. The approach relies on the use of instruction-tuned large language models (LLMs) and is divided into two stages. Given a document including text spans expressing personally identifiable information (PII), the LLM is first applied to obtain truth-preserving replacement candidates for each text span and rank those according to their abstraction level. Those candidates are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics