Anonymous-by-Construction: An LLM-Driven Framework for Privacy-Preserving Text

Federico Albanese; Pablo Ronco; Nicol\'as D'Ippolito

arXiv:2603.17217·cs.CL·March 19, 2026

Anonymous-by-Construction: An LLM-Driven Framework for Privacy-Preserving Text

Federico Albanese, Pablo Ronco, Nicol\'as D'Ippolito

PDF

Open Access

TL;DR

This paper introduces an LLM-driven text anonymization framework that replaces PII with realistic surrogates locally, ensuring privacy, utility, and safe deployment for AI applications without data egress.

Contribution

The authors propose a novel on-premise, LLM-based substitution pipeline for privacy-preserving text anonymization, outperforming existing rule-based and NER methods in privacy and utility metrics.

Findings

01

Achieves state-of-the-art privacy preservation and utility balance.

02

Ensures low trainability loss for downstream tasks.

03

Enables safe, responsible deployment of AI Q&A systems.

Abstract

Responsible use of AI demands that we protect sensitive information without undermining the usefulness of data, an imperative that has become acute in the age of large language models. We address this challenge with an on-premise, LLM-driven substitution pipeline that anonymizes text by replacing personally identifiable information (PII) with realistic, type-consistent surrogates. Executed entirely within organizational boundaries using local LLMs, the approach prevents data egress while preserving fluency and task-relevant semantics. We conduct a systematic, multi-metric, cross-technique evaluation on the Action-Based Conversation Dataset, benchmarking against industry standards (Microsoft Presidio and Google DLP) and a state-of-the-art approach (ZSTS, in redaction-only and redaction-plus-substitution variants). Our protocol jointly measures privacy, semantic utility, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Topic Modeling · Ethics and Social Impacts of AI