# Quantifying the effects of pseudonymisation on epidemiological research reliability: a tailored evaluation using a clinical data warehouse

**Authors:** Ariel Cohen, Yannick Jacob, Gilles Chatellier, Charline Jean, Benoît Playe, Alexandre Mouchet, Etienne Audureau, Antoine Boutet, Romain Bey

PMC · DOI: 10.1186/s12911-026-03360-0 · 2026-02-19

## TL;DR

This study evaluates how pseudonymisation techniques affect the reliability of epidemiological research using electronic health records while balancing privacy and data utility.

## Contribution

The paper introduces a tailored evaluation framework to assess pseudonymisation's impact on epidemiological studies using real-world clinical data.

## Key findings

- Pseudonymisation reduced re-identification risk but was less effective than data minimisation.
- Attack success rates varied significantly between random-target and target-in-cohort scenarios.
- Maintaining low uniqueness required altering temporal coherence, affecting study reliability.

## Abstract

Electronic health records (EHRs) hold immense potential for advancing medical research, but protecting patient privacy remains a critical challenge. Consequently, the choice of privacy-enhancing techniques must take into account the downstream analyses to preserve relevant data properties, often resulting in a trade-off between data utility and privacy. We aimed to evaluate different pseudonymisation algorithms and their impact in the context of six representative archetypal electronic health record epidemiological studies. This work seeks to empower Clinical Data Warehouse (CDW) stakeholders to make informed decisions that minimise privacy risks while ensuring information utility.

We simulated various re-identification attempts conducted by an attacker with legitimate access to cohorts contained in the CDW of the Greater Paris University Hospitals. The dataset comprised 3,950,145 hospitalisation records with an admission between August 1st, 2017 and April 1st, 2024. We considered minimisation and pseudonymisation schemes with different parameterisations, randomly shifting the timestamps of the delivered data while preserving different degrees of temporal coherence among them. The impact of these techniques was assessed both on reliability of six representative archetypal epidemiological studies and on records uniqueness. Two attack scenarios were considered: a random-target attack and a target-in-cohort attack. Advantages and limitations of the different schemes were compared according to the specific requirements of the considered studies.

Attack success rates varied widely – ranging from a median of 0.9% [IQR: 0.3%-9.4%] in the random-target scenario to 99% [IQR: 86%-100%] in the target-in-cohort scenario – with minimisation accounting for most of this variability. Although less effective, pseudonymisation provided an additional reduction in re-identification risk. However, achieving low uniqueness required substantial modifications to temporal coherence, compromising the reliability of certain epidemiological statistics.

Pseudonymisation must therefore be combined with other solutions, in particular data minimisation, to provide optimal privacy protection within CDWs. Our findings highlight the need for tailored data protection strategies that align with specific study objectives to preserve data utility for epidemiological research. Our findings will help Institutional Review Boards and CDW governance bodies and teams in making informed decisions to mitigate privacy risks while maintaining information utility.

The online version contains supplementary material available at 10.1186/s12911-026-03360-0.

## Full-text entities

- **Diseases:** cancer (MESH:D009369), death (MESH:D003643), OMOP (MESH:D011248), bronchiolitis (MESH:D001988), AP-HP (MESH:C537262), pancreatic cancer (MESH:D010190), flu (MESH:D007251)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13020004/full.md

---
Source: https://tomesphere.com/paper/PMC13020004