Cross-Domain Transfer and Few-Shot Learning for Personal Identifiable Information Recognition
Junhong Ye, Xu Yuan, Xinying Qiu

TL;DR
This paper explores cross-domain transfer, data fusion, and few-shot learning techniques to improve PII recognition across various text domains, demonstrating domain-specific transferability and efficiency with limited data.
Contribution
It introduces methods for effective cross-domain transfer and few-shot learning in PII recognition, highlighting domain-specific transferability and data fusion benefits.
Findings
Legal data transfers well to biographical texts.
Medical domain transfer is less effective.
High-quality recognition with only 10% training data in low-specialization domains.
Abstract
Accurate recognition of personally identifiable information (PII) is central to automated text anonymization. This paper investigates the effectiveness of cross-domain model transfer, multi-domain data fusion, and sample-efficient learning for PII recognition. Using annotated corpora from healthcare (I2B2), legal (TAB), and biography (Wikipedia), we evaluate models across four dimensions: in-domain performance, cross-domain transferability, fusion, and few-shot learning. Results show legal-domain data transfers well to biographical texts, while medical domains resist incoming transfer. Fusion benefits are domain-specific, and high-quality recognition is achievable with only 10% of training data in low-specialization domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
