Anonymous-by-Construction: An LLM-Driven Framework for Privacy-Preserving Text
Federico Albanese, Pablo Ronco, Nicol\'as D'Ippolito

TL;DR
This paper introduces an LLM-driven text anonymization framework that replaces PII with realistic surrogates locally, ensuring privacy, utility, and safe deployment for AI applications without data egress.
Contribution
The authors propose a novel on-premise, LLM-based substitution pipeline for privacy-preserving text anonymization, outperforming existing rule-based and NER methods in privacy and utility metrics.
Findings
Achieves state-of-the-art privacy preservation and utility balance.
Ensures low trainability loss for downstream tasks.
Enables safe, responsible deployment of AI Q&A systems.
Abstract
Responsible use of AI demands that we protect sensitive information without undermining the usefulness of data, an imperative that has become acute in the age of large language models. We address this challenge with an on-premise, LLM-driven substitution pipeline that anonymizes text by replacing personally identifiable information (PII) with realistic, type-consistent surrogates. Executed entirely within organizational boundaries using local LLMs, the approach prevents data egress while preserving fluency and task-relevant semantics. We conduct a systematic, multi-metric, cross-technique evaluation on the Action-Based Conversation Dataset, benchmarking against industry standards (Microsoft Presidio and Google DLP) and a state-of-the-art approach (ZSTS, in redaction-only and redaction-plus-substitution variants). Our protocol jointly measures privacy, semantic utility, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Topic Modeling · Ethics and Social Impacts of AI
