GLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extraction

Urchade Zaratiana; Ash Lewis; George Hurn-Maloney

arXiv:2605.09973·cs.CL·May 12, 2026

GLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extraction

Urchade Zaratiana, Ash Lewis, George Hurn-Maloney

PDF

1 Repo 1 Models

TL;DR

GLiNER2-PII is a multilingual, character-level PII detection model trained on synthetic data, achieving state-of-the-art results and publicly available for research and deployment.

Contribution

The paper introduces a small multilingual PII detection model trained on a synthetic corpus, overcoming data scarcity and privacy issues.

Findings

01

Achieves highest span-level F1 on SPY benchmark among five systems.

02

Successfully detects 42 PII entity types across multiple languages.

03

Publicly released on Hugging Face for community use.

Abstract

Reliable detection of personally identifiable information (PII) is increasingly important across modern data-processing systems, yet the task remains difficult: PII spans are heterogeneous, locale-dependent, context-sensitive, and often embedded in noisy or semi-structured documents. We present GLiNER2-PII, a small 0.3B-parameter model adapted from GLiNER2 and designed to recognize a broad taxonomy of 42 PII entity types at character-span resolution. Training such systems, however, is constrained by the scarcity of shareable annotated data and the privacy risks associated with collecting real PII at scale. To address this challenge, we construct a multilingual synthetic corpus of 4,910 annotated texts using a constraint-driven generation pipeline that produces diverse, realistic examples across languages, domains, formats, and entity distributions. On the challenging SPY benchmark,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co
github

Models

🤗
fastino/gliner2-privacy-filter-PII-multi
model· 4.9k dl· ♡ 22
4.9k dl♡ 22

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.