Named Entity Recognition in Unstructured Medical Text Documents

Cole Pearson; Naeem Seliya; Rushit Dave

arXiv:2110.15732·cs.CL·November 1, 2021·1 cites

Named Entity Recognition in Unstructured Medical Text Documents

Cole Pearson, Naeem Seliya, Rushit Dave

PDF

Open Access

TL;DR

This study evaluates the effectiveness of OpenNLP and spaCy for named entity recognition to de-identify sensitive information in medical examination reports, achieving high accuracy with spaCy.

Contribution

It compares the performance of two NLP toolkits for PII removal in medical texts, highlighting spaCy's superior results with a specific training split.

Findings

01

Both platforms achieve high de-identification performance (f-measure > 0.9)

02

spaCy trained with 70-30 split performs best

03

Effective PII removal in medical reports using NER tools

Abstract

Physicians provide expert opinion to legal courts on the medical state of patients, including determining if a patient is likely to have permanent or non-permanent injuries or ailments. An independent medical examination (IME) report summarizes a physicians medical opinion about a patients health status based on the physicians expertise. IME reports contain private and sensitive information (Personally Identifiable Information or PII) that needs to be removed or randomly encoded before further research work can be conducted. In our study the IME is an orthopedic surgeon from a private practice in the United States. The goal of this research is to perform named entity recognition (NER) to identify and subsequently remove/encode PII information from IME reports prepared by the physician. We apply the NER toolkits of OpenNLP and spaCy, two freely available natural language processing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques