Self-Refining Language Model Anonymizers via Adversarial Distillation

Kyuyoung Kim; Hyunjun Jeon; Jinwoo Shin

arXiv:2506.01420·cs.CL·October 27, 2025

Self-Refining Language Model Anonymizers via Adversarial Distillation

Kyuyoung Kim, Hyunjun Jeon, Jinwoo Shin

PDF

Open Access 1 Video

TL;DR

This paper presents SEAL, a distillation framework that trains small language models to perform effective text anonymization through adversarial interactions, enabling privacy protection without relying on proprietary models.

Contribution

Introduces SEAL, a novel adversarial distillation method for training small language models to anonymize text and evaluate outputs, reducing reliance on external proprietary models.

Findings

01

SLMs trained with SEAL achieve privacy-utility trade-offs comparable to GPT-4 anonymizer.

02

Self-refinement improves the anonymization capabilities of small language models.

03

SEAL enables efficient, effective anonymization without external model dependency.

Abstract

Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data from seemingly benign text introduces emerging privacy risks. While recent LLM-based anonymization methods help mitigate such risks, they often rely on proprietary models (e.g., GPT-4), raising concerns about cost and the potential exposure of sensitive data to untrusted external systems. To address this, we introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization without relying on external models at inference time. SEAL leverages adversarial interactions between an LLM anonymizer and an inference model to collect trajectories of anonymized texts and inferred attributes, which are then used to distill anonymization and critique capabilities into SLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Self-Refining Language Model Anonymizers via Adversarial Distillation· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning