Benchmarking Modern Named Entity Recognition Techniques for Free-text   Health Record De-identification

Abdullah Ahmed; Adeel Abbasi; Carsten Eickhoff

arXiv:2103.13546·cs.CL·March 26, 2021·1 cites

Benchmarking Modern Named Entity Recognition Techniques for Free-text Health Record De-identification

Abdullah Ahmed, Adeel Abbasi, Carsten Eickhoff

PDF

Open Access

TL;DR

This paper evaluates various deep learning methods for automatically identifying and removing protected health information from electronic health records to facilitate data sharing for research while maintaining privacy.

Contribution

It systematically compares multiple NER techniques on EHR data, highlighting the effectiveness of BiLSTM-CRF and the impact of character embeddings and transformers.

Findings

01

BiLSTM-CRF outperforms other models in de-identification tasks

02

Character embeddings improve precision but reduce recall

03

Transformers alone underperform as context encoders

Abstract

Electronic Health Records (EHRs) have become the primary form of medical data-keeping across the United States. Federal law restricts the sharing of any EHR data that contains protected health information (PHI). De-identification, the process of identifying and removing all PHI, is crucial for making EHR data publicly available for scientific research. This project explores several deep learning-based named entity recognition (NER) methods to determine which method(s) perform better on the de-identification task. We trained and tested our models on the i2b2 training dataset, and qualitatively assessed their performance using EHR data collected from a local hospital. We found that 1) BiLSTM-CRF represents the best-performing encoder/decoder combination, 2) character-embeddings and CRFs tend to improve precision at the price of recall, and 3) transformers alone under-perform as context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management