Closing the Gap: Joint De-Identification and Concept Extraction in the   Clinical Domain

Lukas Lange; Heike Adel; Jannik Str\"otgen

arXiv:2005.09397·cs.CL·May 20, 2020

Closing the Gap: Joint De-Identification and Concept Extraction in the Clinical Domain

Lukas Lange, Heike Adel, Jannik Str\"otgen

PDF

1 Repo

TL;DR

This paper investigates the impact of de-identification on concept extraction in clinical texts, proposing joint models that improve performance and setting new benchmarks in English and Spanish datasets.

Contribution

It introduces joint models for de-identification and concept extraction, demonstrating their effectiveness and establishing new state-of-the-art results in multiple languages.

Findings

01

Achieved 96.1% F1 in de-identification on English datasets.

02

Achieved 88.9% F1 in concept extraction on English datasets.

03

Achieved 91.4% F1 in concept extraction on Spanish datasets.

Abstract

Exploiting natural language processing in the clinical domain requires de-identification, i.e., anonymization of personal information in texts. However, current research considers de-identification and downstream tasks, such as concept extraction, only in isolation and does not study the effects of de-identification on other tasks. In this paper, we close this gap by reporting concept extraction performance on automatically anonymized data and investigating joint models for de-identification and concept extraction. In particular, we propose a stacked model with restricted access to privacy-sensitive information and a multitask model. We set the new state of the art on benchmark datasets in English (96.1% F1 for de-identification and 88.9% F1 for concept extraction) and Spanish (91.4% F1 for concept extraction).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

boschresearch/joint_anonymization_extraction
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.