Quantifying the effects of pseudonymisation on epidemiological research reliability: a tailored evaluation using a clinical data warehouse
Ariel Cohen, Yannick Jacob, Gilles Chatellier, Charline Jean, Benoît Playe, Alexandre Mouchet, Etienne Audureau, Antoine Boutet, Romain Bey

TL;DR
This study evaluates how pseudonymisation techniques affect the reliability of epidemiological research using electronic health records while balancing privacy and data utility.
Contribution
The paper introduces a tailored evaluation framework to assess pseudonymisation's impact on epidemiological studies using real-world clinical data.
Findings
Pseudonymisation reduced re-identification risk but was less effective than data minimisation.
Attack success rates varied significantly between random-target and target-in-cohort scenarios.
Maintaining low uniqueness required altering temporal coherence, affecting study reliability.
Abstract
Electronic health records (EHRs) hold immense potential for advancing medical research, but protecting patient privacy remains a critical challenge. Consequently, the choice of privacy-enhancing techniques must take into account the downstream analyses to preserve relevant data properties, often resulting in a trade-off between data utility and privacy. We aimed to evaluate different pseudonymisation algorithms and their impact in the context of six representative archetypal electronic health record epidemiological studies. This work seeks to empower Clinical Data Warehouse (CDW) stakeholders to make informed decisions that minimise privacy risks while ensuring information utility. We simulated various re-identification attempts conducted by an attacker with legitimate access to cohorts contained in the CDW of the Greater Paris University Hospitals. The dataset comprised 3,950,145…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Electronic Health Records Systems · Ethics in Clinical Research
