De-Identification of French Unstructured Clinical Notes for Machine Learning Tasks
Yakini Tchouka, Jean-Fran\c{c}ois Couchot, Maxime Coulmeau, David, Laiymani, Philippe Selles, Azzedine Rahmani

TL;DR
This paper presents a new deep learning and differential privacy-based method for de-identifying French medical texts, ensuring patient privacy while maintaining data utility for AI applications.
Contribution
It introduces a comprehensive French-specific de-identification approach combining deep learning detection with differential privacy substitution, filling a gap in existing methods.
Findings
High detection accuracy on French medical data
Effective privacy protection demonstrated
Encouraging results on real hospital dataset
Abstract
Unstructured textual data are at the heart of health systems: liaison letters between doctors, operating reports, coding of procedures according to the ICD-10 standard, etc. The details included in these documents make it possible to get to know the patient better, to better manage him or her, to better study the pathologies, to accurately remunerate the associated medical acts\ldots All this seems to be (at least partially) within reach of today by artificial intelligence techniques. However, for obvious reasons of privacy protection, the designers of these AIs do not have the legal right to access these documents as long as they contain identifying data. De-identifying these documents, i.e. detecting and deleting all identifying information present in them, is a legally necessary step for sharing this data between two complementary worlds. Over the last decade, several proposals have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
