The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization
Ildik\'o Pil\'an, Pierre Lison, Lilja {\O}vrelid, Anthi Papadopoulou,, David S\'anchez, Montserrat Batet

TL;DR
This paper introduces TAB, a comprehensive benchmark and evaluation framework for text anonymization, featuring an annotated corpus of court cases and metrics to assess privacy protection and utility preservation.
Contribution
It provides the first open-source, annotated corpus specifically designed for evaluating text anonymization methods, along with tailored evaluation metrics.
Findings
Baseline models demonstrate varying effectiveness in privacy protection.
The benchmark enables systematic comparison of anonymization techniques.
Evaluation metrics reveal trade-offs between privacy and utility.
Abstract
We present a novel benchmark and associated evaluation metrics for assessing the performance of text anonymization methods. Text anonymization, defined as the task of editing a text document to prevent the disclosure of personal information, currently suffers from a shortage of privacy-oriented annotated text resources, making it difficult to properly evaluate the level of privacy protection offered by various anonymization methods. This paper presents TAB (Text Anonymization Benchmark), a new, open-source annotated corpus developed to address this shortage. The corpus comprises 1,268 English-language court cases from the European Court of Human Rights (ECHR) enriched with comprehensive annotations about the personal information appearing in each document, including their semantic category, identifier type, confidential attributes, and co-reference relations. Compared to previous work,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Privacy, Security, and Data Protection · Freedom of Expression and Defamation
