Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts
Saadullah Amin, Noon Pokaratsiri Goldstein, Morgan Kelly Wixted,, Alejandro Garc\'ia-Rudolph, Catalina Mart\'inez-Costa, G\"unter Neumann

TL;DR
This paper demonstrates that few-shot cross-lingual transfer with pre-trained language models significantly improves de-identification of code-mixed clinical texts in low-resource settings, achieving high F1-scores.
Contribution
It empirically shows the effectiveness of few-shot transfer learning for multilingual clinical text de-identification, especially in code-mixed languages.
Findings
F1-score improved from 73.7% to 91.2% with few-shot transfer.
Achieved a human-evaluation F1-score of 97.2% on out-of-sample data.
Demonstrated effectiveness in low-resource, code-mixed clinical text de-identification.
Abstract
Despite the advances in digital healthcare systems offering curated structured knowledge, much of the critical information still lies in large volumes of unlabeled and unstructured clinical texts. These texts, which often contain protected health information (PHI), are exposed to information extraction tools for downstream applications, risking patient identification. Existing works in de-identification rely on using large-scale annotated corpora in English, which often are not suitable in real-world multilingual settings. Pre-trained language models (LM) have shown great potential for cross-lingual transfer in low-resource settings. In this work, we empirically show the few-shot cross-lingual transfer property of LMs for named entity recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Interpreting and Communication in Healthcare
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Residual Connection · Dropout · WordPiece · Adam · Dense Connections · Attention Dropout · Multi-Head Attention
