DeIDClinic: A Multi-Layered Framework for De-identification of Clinical   Free-text Data

Angel Paul; Dhivin Shaji; Lifeng Han; Warren Del-Pinto; Goran Nenadic

arXiv:2410.01648·cs.CL·October 3, 2024

DeIDClinic: A Multi-Layered Framework for De-identification of Clinical Free-text Data

Angel Paul, Dhivin Shaji, Lifeng Han, Warren Del-Pinto, Goran Nenadic

PDF

Open Access 1 Repo

TL;DR

DeIDClinic is a multi-layered de-identification framework that enhances clinical text privacy by integrating ClinicalBERT with traditional methods, achieving high accuracy and customizable privacy controls.

Contribution

This work introduces DeIDClinic, a novel framework combining deep learning and rule-based methods for improved clinical text de-identification.

Findings

01

Achieves 0.9732 F1-score in entity recognition

02

Effectively identifies names, dates, locations

03

Provides risk assessment for privacy levels

Abstract

De-identification is important in protecting patients' privacy for healthcare text analytics. The MASK framework is one of the best on the de-identification shared task organised by n2c2/i2b2 challenges. This work enhances the MASK framework by integrating ClinicalBERT, a deep learning model specifically fine-tuned on clinical texts, alongside traditional de-identification methods like dictionary lookup and rule-based approaches. The system effectively identifies and either redacts or replaces sensitive identifiable entities within clinical documents, while also allowing users to customise the masked documents according to their specific needs. The integration of ClinicalBERT significantly improves the performance of entity recognition, achieving 0.9732 F1-score, especially for common entities such as names, dates, and locations. A risk assessment feature has also been developed,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

angelpaulml17/DeIDClinic
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElectronic Health Records Systems