Robust Utility-Preserving Text Anonymization Based on Large Language Models

Tianyu Yang; Xiaodan Zhu; Iryna Gurevych

arXiv:2407.11770·cs.CL·June 19, 2025

Robust Utility-Preserving Text Anonymization Based on Large Language Models

Tianyu Yang, Xiaodan Zhu, Iryna Gurevych

PDF

Open Access 1 Repo

TL;DR

This paper introduces a LLM-based framework for text anonymization that balances privacy preservation against re-identification risks and maintains data utility for downstream tasks, with extensive experiments validating its effectiveness.

Contribution

It proposes a novel LLM-driven anonymization framework with dedicated privacy and utility evaluators, and explores distillation for real-time applications, outperforming existing methods.

Findings

01

Outperforms baseline models in re-identification risk reduction

02

Preserves higher data utility in downstream tasks

03

Effective in large-scale, real-time scenarios

Abstract

Anonymizing text that contains sensitive information is crucial for a wide range of applications. Existing techniques face the emerging challenges of the re-identification ability of large language models (LLMs), which have shown advanced capability in memorizing detailed information and reasoning over dispersed pieces of patterns to draw conclusions. When defending against LLM-based re-identification, anonymization could jeopardize the utility of the resulting anonymized data in downstream tasks. In general, the interaction between anonymization and data utility requires a deeper understanding within the context of LLMs. In this paper, we propose a framework composed of three key LLM-based components: a privacy evaluator, a utility evaluator, and an optimization component, which work collaboratively to perform anonymization. Extensive experiments demonstrate that the proposed model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ukplab/arxiv2024-rupta
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Authorship Attribution and Profiling