A Deep Learning Architecture for De-identification of Patient Notes: Implementation and Evaluation
Kaung Khin, Philipp Burckhardt, Rema Padman

TL;DR
This paper introduces a deep learning model utilizing contextualized embeddings and variational dropout Bi-LSTMs for de-identifying patient notes, achieving state-of-the-art results efficiently without external knowledge sources.
Contribution
The paper presents a novel deep learning architecture that improves de-identification of clinical notes by leveraging recent NLP advances, outperforming existing methods.
Findings
Achieves state-of-the-art performance on two datasets.
Converges faster than previous models.
Does not require dictionaries or external knowledge sources.
Abstract
De-identification is the process of removing 18 protected health information (PHI) from clinical notes in order for the text to be considered not individually identifiable. Recent advances in natural language processing (NLP) has allowed for the use of deep learning techniques for the task of de-identification. In this paper, we present a deep learning architecture that builds on the latest NLP advances by incorporating deep contextualized word embeddings and variational drop out Bi-LSTMs. We test this architecture on two gold standard datasets and show that the architecture achieves state-of-the-art performance on both data sets while also converging faster than other systems without the use of dictionaries or other knowledge sources.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Machine Learning in Healthcare
