Differentially Private De-identification of Dutch Clinical Notes: A Comparative Evaluation

Michele Miranda; Xinlan Yan; Nishant Mishra; Rachel Murphy; Ameen Abu-Hanna; S\'ebastien Brati\`eres; Iacer Calixto

arXiv:2604.21421·cs.CR·April 24, 2026

Differentially Private De-identification of Dutch Clinical Notes: A Comparative Evaluation

Michele Miranda, Xinlan Yan, Nishant Mishra, Rachel Murphy, Ameen Abu-Hanna, S\'ebastien Brati\`eres, Iacer Calixto

PDF

TL;DR

This study compares differential privacy, named entity recognition, and large language models for de-identifying Dutch clinical notes, highlighting how hybrid approaches improve privacy and utility.

Contribution

First comprehensive comparison of DP, NER, and LLM methods for Dutch clinical text de-identification, including hybrid strategies and performance assessment.

Findings

01

DP alone reduces utility significantly

02

Hybrid methods with LLM preprocessing improve privacy-utility balance

03

Combining DP with linguistic preprocessing enhances de-identification effectiveness

Abstract

Protecting patient privacy in clinical narratives is essential for enabling secondary use of healthcare data under regulations such as GDPR and HIPAA. While manual de-identification remains the gold standard, it is costly and slow, motivating the need for automated methods that combine privacy guarantees with high utility. Most automated text de-identification pipelines employed named entity recognition (NER) to identify protected entities for redaction. Although methods based on differential privacy (DP) provide formal privacy guarantees, more recently also large language models (LLMs) are increasingly used for text de-identification in the clinical domain. In this work, we present the first comparative study of DP, NER, and LLMs for Dutch clinical text de-identification. We investigate these methods separately as well as hybrid strategies that apply NER or LLM preprocessing prior to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.