Robust Utility-Preserving Text Anonymization Based on Large Language Models
Tianyu Yang, Xiaodan Zhu, Iryna Gurevych

TL;DR
This paper introduces a LLM-based framework for text anonymization that balances privacy preservation against re-identification risks and maintains data utility for downstream tasks, with extensive experiments validating its effectiveness.
Contribution
It proposes a novel LLM-driven anonymization framework with dedicated privacy and utility evaluators, and explores distillation for real-time applications, outperforming existing methods.
Findings
Outperforms baseline models in re-identification risk reduction
Preserves higher data utility in downstream tasks
Effective in large-scale, real-time scenarios
Abstract
Anonymizing text that contains sensitive information is crucial for a wide range of applications. Existing techniques face the emerging challenges of the re-identification ability of large language models (LLMs), which have shown advanced capability in memorizing detailed information and reasoning over dispersed pieces of patterns to draw conclusions. When defending against LLM-based re-identification, anonymization could jeopardize the utility of the resulting anonymized data in downstream tasks. In general, the interaction between anonymization and data utility requires a deeper understanding within the context of LLMs. In this paper, we propose a framework composed of three key LLM-based components: a privacy evaluator, a utility evaluator, and an optimization component, which work collaboratively to perform anonymization. Extensive experiments demonstrate that the proposed model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Authorship Attribution and Profiling
