Cross-Domain Transfer and Few-Shot Learning for Personal Identifiable Information Recognition

Junhong Ye; Xu Yuan; Xinying Qiu

arXiv:2507.11862·cs.CL·January 13, 2026

Cross-Domain Transfer and Few-Shot Learning for Personal Identifiable Information Recognition

Junhong Ye, Xu Yuan, Xinying Qiu

PDF

Open Access

TL;DR

This paper explores cross-domain transfer, data fusion, and few-shot learning techniques to improve PII recognition across various text domains, demonstrating domain-specific transferability and efficiency with limited data.

Contribution

It introduces methods for effective cross-domain transfer and few-shot learning in PII recognition, highlighting domain-specific transferability and data fusion benefits.

Findings

01

Legal data transfers well to biographical texts.

02

Medical domain transfer is less effective.

03

High-quality recognition with only 10% training data in low-specialization domains.

Abstract

Accurate recognition of personally identifiable information (PII) is central to automated text anonymization. This paper investigates the effectiveness of cross-domain model transfer, multi-domain data fusion, and sample-efficient learning for PII recognition. Using annotated corpora from healthcare (I2B2), legal (TAB), and biography (Wikipedia), we evaluate models across four dimensions: in-domain performance, cross-domain transferability, fusion, and few-shot learning. Results show legal-domain data transfers well to biographical texts, while medical domains resist incoming transfer. Fusion benefits are domain-specific, and high-quality recognition is achievable with only 10% of training data in low-specialization domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning