Differentially Private De-identification of Dutch Clinical Notes: A Comparative Evaluation
Michele Miranda, Xinlan Yan, Nishant Mishra, Rachel Murphy, Ameen Abu-Hanna, S\'ebastien Brati\`eres, Iacer Calixto

TL;DR
This study compares differential privacy, named entity recognition, and large language models for de-identifying Dutch clinical notes, highlighting how hybrid approaches improve privacy and utility.
Contribution
First comprehensive comparison of DP, NER, and LLM methods for Dutch clinical text de-identification, including hybrid strategies and performance assessment.
Findings
DP alone reduces utility significantly
Hybrid methods with LLM preprocessing improve privacy-utility balance
Combining DP with linguistic preprocessing enhances de-identification effectiveness
Abstract
Protecting patient privacy in clinical narratives is essential for enabling secondary use of healthcare data under regulations such as GDPR and HIPAA. While manual de-identification remains the gold standard, it is costly and slow, motivating the need for automated methods that combine privacy guarantees with high utility. Most automated text de-identification pipelines employed named entity recognition (NER) to identify protected entities for redaction. Although methods based on differential privacy (DP) provide formal privacy guarantees, more recently also large language models (LLMs) are increasingly used for text de-identification in the clinical domain. In this work, we present the first comparative study of DP, NER, and LLMs for Dutch clinical text de-identification. We investigate these methods separately as well as hybrid strategies that apply NER or LLM preprocessing prior to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
