In the Name of Fairness: Assessing the Bias in Clinical Record De-identification
Yuxin Xiao, Shulammite Lim, Tom Joseph Pollard, Marzyeh Ghassemi

TL;DR
This study evaluates bias in clinical record de-identification systems, revealing significant demographic disparities and proposing fine-tuning as a mitigation strategy to promote fairness in data sharing.
Contribution
It provides a large-scale empirical analysis of demographic bias in de-identification systems and introduces a simple fine-tuning approach to reduce performance gaps.
Findings
Significant performance disparities across demographic groups.
De-identification quality affected by polysemy and context.
Fine-tuning improves fairness across diverse names.
Abstract
Data sharing is crucial for open science and reproducible research, but the legal sharing of clinical data requires the removal of protected health information from electronic health records. This process, known as de-identification, is often achieved through the use of machine learning algorithms by many commercial and open-source systems. While these systems have shown compelling results on average, the variation in their performance across different demographic groups has not been thoroughly examined. In this work, we investigate the bias of de-identification systems on names in clinical notes via a large-scale empirical analysis. To achieve this, we create 16 name sets that vary along four demographic dimensions: gender, race, name popularity, and the decade of popularity. We insert these names into 100 manually curated clinical templates and evaluate the performance of nine public…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics in Clinical Research · Electronic Health Records Systems · Global Cancer Incidence and Screening
