The challenges of replication: A worked example of methods reproducibility using electronic health record data
Richard Williams, Thomas Bolton, David Jenkins, Mehrdad A. Mizani, Matthew Sperrin, Cathie Sudlow, Angela Wood, Adrian Heald, Niels Peek, Youssef El Khatib, Youssef El Khatib, Youssef El Khatib

TL;DR
This paper explores the challenges of replicating a study using electronic health record data and offers recommendations to improve reproducibility.
Contribution
The paper introduces a new concept called 'data reproducibility' and provides practical recommendations for improving replication in EHR studies.
Findings
Differences between data environments and sources caused challenges in methods reproducibility.
Recommendations include better metadata, standardized governance, code sharing, and support structures.
Data reproducibility is identified as a new theme requiring further research.
Abstract
The ability to reproduce the work of others is an essential part of the scientific disciplines. Replicating observational studies using electronic health record (EHR) data can be challenging due to complexities in data access, variations in EHR systems across institutions, and the potential for unaccounted confounding variables. Our aim is to identify the barriers to methods reproducibility for replication studies using EHR data. We replicated a study that examined the risk of hospitalisation following a positive COVID-19 test in individuals with diabetes. Using EHR data from the NHS England’s Secure Data Environment (SDE) covering the whole of England, UK (population 57m), we sought to replicate findings from the original study, which used data from Greater Manchester (a large urban region in the UK, population 2.9m). Both analyses were conducted in Trusted Research Environments…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Ethics in Clinical Research · Scientific Computing and Data Management
