# Concordance of Lung Cancer, Melanoma, and Renal Cell Cancer Diagnosis Information Recorded in Health Care Databases in England: Analysis of Linkage Between Primary Care, Hospital Care, and Cancer Registry Data

**Authors:** Paul D. Kruithof, Patrick C. Souverein, Johanna H. M. Driessen, Lizza E. L. Hendriks, Sander Croes, Robin M. J. M. van Geel

PMC · DOI: 10.1002/pds.70299 · 2026-01-14

## TL;DR

This study evaluates how well cancer diagnosis data from different healthcare databases in England align, finding high agreement for lung cancer but longer delays for melanoma and kidney cancer.

## Contribution

The study provides a detailed assessment of data linkage quality across primary care, hospital care, and cancer registry databases for three cancer types in England.

## Key findings

- Concordance of cancer diagnosis records between NCRAS, CPRD Aurum, and HES-APC exceeded 70%.
- Most lung cancer diagnoses were captured within 3 months of initial diagnosis across datasets.
- SACT had significantly fewer matched patients, especially among those over 80 years old.

## Abstract

Real‐world evidence (RWE) addresses clinical trial limitations by capturing more representative patient populations and improves evaluation of anticancer treatments, although it becomes available only years after market authorization. As many RWE sources capture only parts of the healthcare continuum, dataset linkage is necessary to enhance data richness. Linkage quality must be assessed to prevent information bias due to incomplete data linkage.

We evaluated diagnosis concordance for lung cancer (LC), melanoma, and renal cell cancer (RCC) in England. Patients were matched based on national health service (NHS) number, sex and date of birth. Eligible patients were drawn from the National Cancer Registry and Analysis Service (NCRAS), and matched with three other datasets: Clinical Research Practice Database Aurum (CPRD Aurum), Hospital Episode Statistics Admitted Patient Care (HES‐APC), and systemic anticancer treatment (SACT). Concordance was evaluated for cancer diagnosis and date of diagnosis. Determinants of non‐concordance were investigated to assess representativeness.

In total, 89 797 patients with LC, melanoma or RCC were identified, and concordance of cancer diagnosis records between NCRAS, CPRD Aurum and HES‐APC exceeded 70%. Because patients are only registered in SACT upon receiving systemic anticancer treatment, matched numbers in SACT were significantly lower (3.0%–21.1%), as anticipated, particularly among patients over 80 years of age. However, differences in patient characteristics across datasets were limited. Concordance analyses showed that the majority of cases with LC diagnoses were registered within 3 months of the initial diagnosis within all data sources, whereas melanoma and RCC showed longer delays.

Given the high concordance, NCRAS data can be enriched with HES‐APC and CPRD Aurum, and further complemented by SACT for systemic therapy. Provided that SACT undergoes further validation, linkage between NCRAS, CPRD Aurum, HES‐APC, and SACT may be a promising resource for RWE generation in oncology research.

Registry linkage is essential for real‐world evidence, as key outcome measures are recorded in separate registries.NCRAS registrations of selected cancer diagnoses can be adequately linked with corresponding patient records in other datasets.Most lung cancer diagnoses were concordantly captured within 3 months of initial diagnosis.Melanoma and renal cell carcinoma diagnoses demonstrate longer registration delays.Provided SACT recording is validated, it may offer a promising opportunity for real‐world anticancer research, as over 90% of SACT‐registered patients can be linked to primary care data via CPRD Aurum.

Registry linkage is essential for real‐world evidence, as key outcome measures are recorded in separate registries.

NCRAS registrations of selected cancer diagnoses can be adequately linked with corresponding patient records in other datasets.

Most lung cancer diagnoses were concordantly captured within 3 months of initial diagnosis.

Melanoma and renal cell carcinoma diagnoses demonstrate longer registration delays.

Provided SACT recording is validated, it may offer a promising opportunity for real‐world anticancer research, as over 90% of SACT‐registered patients can be linked to primary care data via CPRD Aurum.

RWE is important in the evaluation of outcomes of anticancer agents. Many RWE sources capture the healthcare continuum only partially, and linkage is needed to improve data richness. We evaluated diagnosis concordance in four datasets to assess linkage completeness. Patient records were drawn from the NCRAS if they had lung cancer, melanoma, or RCC between 2011 and 2018, and if linkage to CPRD Aurum was possible based on age, sex, and NHS number; 89797 patients in NCRAS were matched with data from CPRD Aurum, the HES‐APC, and SACT datasets. Diagnosis concordance was substantial between registries (all > 70%). Matched patients in SACT were limited, which seems consistent with the expected small number of cases eligible for and willing to receive systemic treatments. However, given the high concordance, NCRAS data can be enriched with HES‐APC and CPRD Aurum and further complemented by SACT data for systemic therapy. Provided that SACT undergoes further validation, linkage between NCRAS, CPRD Aurum, HES‐APC, and SACT may be a promising resource for RWE generation in oncology research.

## Linked entities

- **Diseases:** lung cancer (MONDO:0005138), melanoma (MONDO:0005105), renal cell cancer (MONDO:0003007)

## Full-text entities

- **Genes:** APC (APC regulator of Wnt signaling pathway) [NCBI Gene 324] {aka BTPS2, DESMD, DP2, DP2.5, DP3, GS}
- **Diseases:** Cancer (MESH:D009369), LC (MESH:D008175), Melanoma (MESH:D008545), RCC (MESH:D002292)
- **Species:** Homo sapiens (human, species) [taxon 9606]

---
Source: https://tomesphere.com/paper/PMC12803871