Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records
Jan Trienes, Dolf Trieschnigg, Christin Seifert, Djoerd Hiemstra

TL;DR
This study compares rule-based, feature-based, and deep neural methods for de-identifying Dutch medical records, demonstrating the neural approach's superior generalizability and ease of use across languages and domains.
Contribution
It provides a comprehensive evaluation of de-identification methods on a new Dutch healthcare dataset, highlighting the effectiveness of neural models and releasing resources for future research.
Findings
Neural methods outperform rule-based and feature-based approaches.
Rule-based methods do not generalize well to new Dutch healthcare data.
Neural models require less configuration and domain knowledge.
Abstract
Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English medical text and little is known about the generalizability of de-identification methods across languages and domains. In this study, we construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains. Our experiments show that an existing rule-based method specifically developed for the Dutch language fails to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare
MethodsTest
