# Academic impact and research data utilisation of the clinical practice research datalink: scientometric analyses

**Authors:** Marta Pineda-Moncusí, Maria Rahman, Eleanor L. Axson, Susan Hodgson, Antonella Delmestri

PMC · DOI: 10.1007/s10654-025-01347-1 · 2026-01-24

## TL;DR

This paper analyzes the global academic impact and data usage of the CPRD from 1988 to 2024, highlighting key contributors and trends in research output.

## Contribution

A comprehensive scientometric analysis of CPRD-related research output and data usage over 36 years.

## Key findings

- The UK led in CPRD-related research output, followed by the US and Canada.
- CPRD GOLD was the most commonly used dataset, though CPRD Aurum usage increased recently.
- Most recent studies used linked datasets like Hospital Episode Statistics and mortality data.

## Abstract

Since its establishment in the late 1980s, the UK Clinical Practice Research Datalink (CPRD) has become one of the most widely utilised data resources in both national and international research. Its value lies in the richness, scale and quality of its routinely collected primary care data, as well as the availability of numerous linkable datasets. This study provides comprehensive scientometric analyses of CPRD-related research output, impact, and data usage from 1988 to 2024. A total of 3779 peer-reviewed publications were identified, and for 98.78% of them, enriched bibliometric metadata were retrieved through Scopus and Web of Science. The UK emerged as the leading contributing country, with the United States and Canada ranking second and third. ‘McGill University’ was the most frequently affiliated institution, followed by the ‘University of Manchester’ and the ‘University of Oxford’, with seven UK universities among the top ten. The three journals most frequently publishing CPRD-based research overall, and since 2020, were ‘BMJ Open’, ‘Pharmacoepidemiology and Drug Safety’ and ‘British Journal of General Practice’. Analyses of primary care data sources utilisation revealed that overall, 86.35% of manuscripts used CPRD GOLD exclusively, 8.39% used both CPRD GOLD and CPRD Aurum, and 4.76% used CPRD Aurum alone, although recent years showed an increased use of CPRD Aurum. Between 2016 and 2024, most articles (80.26%) were associated with CPRD research applications that referenced linked or CPRD algorithm-derived datasets. The three most frequently used were ‘Hospital Episode Statistics’ (69.77%), ‘Small Area Linkages’ (62.27%) and ‘Office for National Statistics’ mortality data (53.28%).

The online version contains supplementary material available at 10.1007/s10654-025-01347-1.

## Full-text entities

- **Diseases:** DID (MESH:C564543), psoriasis (MESH:D011565), CDM (MESH:D004195), dementia (MESH:D003704), COPD (MESH:D029424), ILD (MESH:D017563), type 2 diabetes (MESH:D003924), respiratory conditions (MESH:D012131), asthma (MESH:D001249), COVID-19 (MESH:D000086382), Cancer (MESH:D009369), myocardial infarction (MESH:D009203), cardiovascular disease (MESH:D002318), OMOP (MESH:D011248), death (MESH:D003643), rheumatoid arthritis (MESH:D001172), CPRD (MESH:D014947)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12975798/full.md

---
Source: https://tomesphere.com/paper/PMC12975798