De-Identification of French Unstructured Clinical Notes for Machine   Learning Tasks

Yakini Tchouka; Jean-Fran\c{c}ois Couchot; Maxime Coulmeau; David; Laiymani; Philippe Selles; Azzedine Rahmani

arXiv:2209.09631·cs.CR·October 9, 2023·1 cites

De-Identification of French Unstructured Clinical Notes for Machine Learning Tasks

Yakini Tchouka, Jean-Fran\c{c}ois Couchot, Maxime Coulmeau, David, Laiymani, Philippe Selles, Azzedine Rahmani

PDF

Open Access

TL;DR

This paper presents a new deep learning and differential privacy-based method for de-identifying French medical texts, ensuring patient privacy while maintaining data utility for AI applications.

Contribution

It introduces a comprehensive French-specific de-identification approach combining deep learning detection with differential privacy substitution, filling a gap in existing methods.

Findings

01

High detection accuracy on French medical data

02

Effective privacy protection demonstrated

03

Encouraging results on real hospital dataset

Abstract

Unstructured textual data are at the heart of health systems: liaison letters between doctors, operating reports, coding of procedures according to the ICD-10 standard, etc. The details included in these documents make it possible to get to know the patient better, to better manage him or her, to better study the pathologies, to accurately remunerate the associated medical acts\ldots All this seems to be (at least partially) within reach of today by artificial intelligence techniques. However, for obvious reasons of privacy protection, the designers of these AIs do not have the legal right to access these documents as long as they contain identifying data. De-identifying these documents, i.e. detecting and deleting all identifying information present in them, is a legally necessary step for sharing this data between two complementary worlds. Over the last decade, several proposals have…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies