Transferability of Neural Network Clinical De-identification Systems
Kahyun Lee, Nicholas J. Dobbins, Bridget McInnes, Meliha Yetisgen,, Ozlem Uzuner

TL;DR
This study evaluates how well a neural network for clinical de-identification can transfer across different datasets, note types, and institutions, highlighting fine-tuning and external data use as key factors.
Contribution
It systematically assesses transferability of NeuroNER across multiple clinical datasets and proposes architectural modifications for domain generalization.
Findings
External sources improve transfer performance to around 80% F1-score.
Fine-tuning is the most effective transfer strategy.
External data remains useful even with in-domain training data.
Abstract
Objective: Neural network de-identification studies have focused on individual datasets. These studies assume the availability of a sufficient amount of human-annotated data to train models that can generalize to corresponding test data. In real-world situations, however, researchers often have limited or no in-house training data. Existing systems and external data can help jump-start de-identification on in-house data; however, the most efficient way of utilizing existing systems and external data is unclear. This article investigates the transferability of a state-of-the-art neural clinical de-identification system, NeuroNER, across a variety of datasets, when it is modified architecturally for domain generalization and when it is trained strategically for domain transfer. Methods and Materials: We conducted a comparative study of the transferability of NeuroNER using four clinical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
