Performance of Automatic De-identification Across Different Note Types

Nicholas Dobbins; David Wayne; Kahyun Lee; \"Ozlem Uzuner; Meliha; Yetisgen

arXiv:2102.11032·cs.CL·February 23, 2021·1 cites

Performance of Automatic De-identification Across Different Note Types

Nicholas Dobbins, David Wayne, Kahyun Lee, \"Ozlem Uzuner, Meliha, Yetisgen

PDF

Open Access

TL;DR

This study evaluates the effectiveness of a state-of-the-art de-identification system, NeuroNER, in detecting protected health information across various clinical note types and institutions, highlighting its performance variability.

Contribution

It provides an empirical assessment of NeuroNER's performance on diverse clinical notes and compares training data sources, informing future de-identification system development.

Findings

01

NeuroNER performs variably across note types and institutions.

02

Training on institution-specific data improves de-identification accuracy.

03

Performance differs significantly when models are trained on external versus local data.

Abstract

Free-text clinical notes detail all aspects of patient care and have great potential to facilitate quality improvement and assurance initiatives as well as advance clinical research. However, concerns about patient privacy and confidentiality limit the use of clinical notes for research. As a result, the information documented in these notes remains unavailable for most researchers. De-identification (de-id), i.e., locating and removing personally identifying protected health information (PHI), is one way of improving access to clinical narratives. However, there are limited off-the-shelf de-identification systems able to consistently detect PHI across different data sources and medical specialties. In this abstract, we present the performance of a state-of-the art de-id system called NeuroNER1 on a diverse set of notes from University of Washington (UW) when the models are trained on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Electronic Health Records Systems · Data Quality and Management