# An empirical assessment of differential privacy in real-world observational data: a case-control study of asthma exacerbation in UK Biobank linked with electronic health records

**Authors:** Mehrdad A Mizani, Aziz Sheikh, Amitava Banerjee

PMC · DOI: 10.1093/jamia/ocaf090 · 2025-06-18

## TL;DR

This study evaluates how differential privacy affects the results of a case-control analysis on asthma exacerbation using real-world health data.

## Contribution

The paper empirically assesses differential privacy's impact on case-control study outcomes and provides insights for selecting privacy parameters.

## Key findings

- Differential privacy altered odds ratios, sometimes mimicking misclassification and false-positive bias.
- Rare covariates showed greater variability in odds ratios under differential privacy.
- Epsilon values below ln(2) caused unstable results, suggesting the need for averaging or fixed random seeds.

## Abstract

Electronic health records (EHRs) provide substantial resources for observational studies, yet present significant challenges in safeguarding patient privacy while maintaining research quality. Differential privacy (DP) offers a quantifiable privacy guarantee; however, its impact on observational studies remains underexplored. We empirically evaluated the effects of DP across varying values of its privacy parameter, epsilon, on case-control analysis outcomes using EHR data. This study aims to inform DP parameter selection and examines the influence of study characteristics on differentially private observational studies.

We assessed the effects of DP on a case-control study of 1-year asthma exacerbations, including 22 165 participants with a history of asthma from UK Biobank linked to EHR data. Odds ratios (ORs) for sociodemographic factors and comorbidities were analyzed using adjusted and propensity score-matched models across epsilon values.

DP influenced the magnitude, direction, and statistical significance of ORs, occasionally resembling patterns of misclassification, residual confounding, and false-positive bias. Rare and imbalanced covariates showed greater OR variability, especially in matched studies. Epsilons smaller than ln(2) led to noticeable OR fluctuations.

The impact of DP on ORs and selection of an optimal epsilon depends on sample size, covariate prevalence, confounders, case-to-control ratios in propensity score matching, mitigation of random seed p-hacking, and trust models.

The effects of DP on ORs are highly context-dependent. In this study, epsilon values below ln(2) led to unstable ORs across random seeds. Averaging results or using predetermined seeds may help reduce variability and mitigate p-hacking.

## Linked entities

- **Diseases:** asthma (MONDO:0004979)
- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Diseases:** asthma (MESH:D001249)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12277706/full.md

---
Source: https://tomesphere.com/paper/PMC12277706