PRvL: Quantifying the Capabilities and Risks of Large Language Models for PII Redaction

Leon Garza; Anantaa Kotal; Aritran Piplai; Lavanya Elluri; Prajit Das; Aman Chadha

arXiv:2508.05545·cs.CR·August 8, 2025

PRvL: Quantifying the Capabilities and Risks of Large Language Models for PII Redaction

Leon Garza, Anantaa Kotal, Aritran Piplai, Lavanya Elluri, Prajit Das, Aman Chadha

PDF

TL;DR

This paper evaluates how different large language model architectures and training strategies perform in the task of redacting personally identifiable information from text, aiming to improve privacy, accuracy, and efficiency.

Contribution

It provides a comprehensive analysis of LLMs for PII redaction, introduces PRvL, an open-source suite of fine-tuned models and tools, and offers practical guidance for deploying privacy-preserving redaction systems.

Findings

01

LLMs can effectively redact PII with proper configuration.

02

Trade-offs exist between redaction accuracy, semantic preservation, and computational cost.

03

PRvL enables customizable, privacy-aware PII redaction in secure environments.

Abstract

Redacting Personally Identifiable Information (PII) from unstructured text is critical for ensuring data privacy in regulated domains. While earlier approaches have relied on rule-based systems and domain-specific Named Entity Recognition (NER) models, these methods fail to generalize across formats and contexts. Recent advances in Large Language Models (LLMs) offer a promising alternative, yet the effect of architectural and training choices on redaction performance remains underexplored. LLMs have demonstrated strong performance in tasks that require contextual language understanding, including the redaction of PII in free-form text. Prior work suggests that with appropriate adaptation, LLMs can become effective contextual privacy learners. However, the consequences of architectural and training choices for PII Redaction remain underexplored. In this work, we present a comprehensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.