PII-Bench: Evaluating Query-Aware Privacy Protection Systems
Hao Shen, Zhouhong Gu, Haokai Hong, Weili Han

TL;DR
This paper introduces PII-Bench, a comprehensive evaluation framework for assessing privacy protection systems in LLMs, highlighting current models' limitations in query-aware PII masking across diverse scenarios.
Contribution
The paper presents PII-Bench, the first detailed benchmark for evaluating query-aware PII masking, with extensive test samples and a focus on complex multi-party scenarios.
Findings
Current models detect PII well but struggle with query relevance.
State-of-the-art LLMs have significant limitations in complex scenarios.
Room for improvement in intelligent PII masking methods.
Abstract
The widespread adoption of Large Language Models (LLMs) has raised significant privacy concerns regarding the exposure of personally identifiable information (PII) in user prompts. To address this challenge, we propose a query-unrelated PII masking strategy and introduce PII-Bench, the first comprehensive evaluation framework for assessing privacy protection systems. PII-Bench comprises 2,842 test samples across 55 fine-grained PII categories, featuring diverse scenarios from single-subject descriptions to complex multi-party interactions. Each sample is carefully crafted with a user query, context description, and standard answer indicating query-relevant PII. Our empirical evaluation reveals that while current models perform adequately in basic PII detection, they show significant limitations in determining PII query relevance. Even state-of-the-art LLMs struggle with this task,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection · Privacy-Preserving Technologies in Data · Ethics and Social Impacts of AI
