The Conundrum of Trustworthy Research on Attacking Personally Identifiable Information Removal Techniques
Sebastian Ochs, Ivan Habernal

TL;DR
This paper critically examines the effectiveness of PII removal techniques in protecting privacy, revealing that current evaluations are flawed due to data leakage issues and highlighting the challenges in objectively assessing privacy protection.
Contribution
It provides a critical analysis of existing attack evaluations on PII removal, identifying key flaws and proposing the need for private data access for trustworthy assessment.
Findings
Existing attack evaluations are often flawed due to data leakage.
Proper assessment requires access to private data, which is heavily restricted.
Current methods may overestimate the vulnerability of PII removal techniques.
Abstract
Removing personally identifiable information (PII) from texts is necessary to comply with various data protection regulations and to enable data sharing without compromising privacy. However, recent works show that documents sanitized by PII removal techniques are vulnerable to reconstruction attacks. Yet, we suspect that the reported success of these attacks is largely overestimated. We critically analyze the evaluation of existing attacks and find that data leakage and data contamination are not properly mitigated, leaving the question whether or not PII removal techniques truly protect privacy in real-world scenarios unaddressed. We investigate possible data sources and attack setups that avoid data leakage and conclude that only truly private data can allow us to objectively evaluate vulnerabilities in PII removal techniques. However, access to private data is heavily restricted -…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Digital and Cyber Forensics · Data Quality and Management
