Examining the impact of data quality and completeness of electronic health records on predictions of patients risks of cardiovascular disease
Yan Li, Matthew Sperrin, Glen P. Martin, Darren M Ashcroft, Tjeerd, Pieter van Staa

TL;DR
This study investigates how variations in data quality and completeness of electronic health records influence the robustness of cardiovascular risk predictions using QRISK3 across multiple practices.
Contribution
It demonstrates that heterogeneity in CVD incidence is largely unaffected by data quality variations, emphasizing the need for clinical judgment alongside risk models.
Findings
Significant heterogeneity in CVD incidence between practices.
Data quality and completeness were similar across practices.
Variation in risk predictions was not explained by data quality differences.
Abstract
The objective is to assess the extent of variation of data quality and completeness of electronic health records and impact on the robustness of risk predictions of incident cardiovascular disease (CVD) using a risk prediction tool that is based on routinely collected data (QRISK3). The study design is a longitudinal cohort study with a setting of 392 general practices (including 3.6 million patients) linked to hospital admission data. Variation in data quality was assessed using Saez stability metrics quantifying outlyingness of each practice. Statistical frailty models evaluated whether accuracy of QRISK3 predictions on individual predictions and effects of overall risk factors (linear predictor) varied between practices. There was substantial heterogeneity between practices in CVD incidence unaccounted for by QRISK3. In the lowest quintile of statistical frailty, a QRISK3 predicted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
