# Evaluating algorithmic approaches to rare disease case-finding: a retrospective validation study using electronic health records

**Authors:** Freya Boardman-Pretty, Jyothika Kumar, Calum Grant, Elena Marchini, William Evans, Lara Menzies, Rand Dubis, Amanda Worker, Elizabeth Varones, Alan Warren, Jack Sams, Daniel Ollerenshaw, Jez Stockdale, Hadley Mahon, Peter Fish

PMC · DOI: 10.1186/s13023-026-04240-6 · Orphanet Journal of Rare Diseases · 2026-02-04

## TL;DR

This study evaluates an AI tool called MendelScan that uses electronic health records to identify patients with rare diseases, showing high specificity but low sensitivity.

## Contribution

The study provides a retrospective validation of case-finding algorithms for 34 rare diseases using EHR data, highlighting their potential for early diagnosis.

## Key findings

- MendelScan showed high specificity (median 99.9966%) but low sensitivity (median 3.8%) for rare disease detection.
- Positive likelihood ratios were strong (median 1167), indicating potential for identifying rare diseases.
- Negative likelihood ratios were limited (median 0.96), suggesting limited utility for ruling out diseases.

## Abstract

Accurate and early diagnosis to optimise rare disease care is a global priority. With recent developments in artificial intelligence (AI)-based solutions, a promising area to improve rare disease diagnosis is the application of AI to routinely collected health care data captured in electronic health records (EHRs). MendelScan is a rare disease case-finding tool that analyses structured EHR data, using algorithms to identify patterns that are associated with an increased likelihood of the patient being affected by one of a number of rare diseases, in order to put such patients forward for further review.

In this paper, we evaluated the performance of case-finding algorithms for 34 rare diseases within MendelScan, by performing a retrospective validation study using research EHR data. The primary objectives were to assess MendelScan’s ability to correctly identify cases versus controls, and to investigate other metrics indicating feasibility of large-scale deployment and time identified (flagged) relative to diagnosis. We measured algorithm performance by sensitivity, specificity, positive predictive value (PPV), and likelihood ratios.

Algorithm performance varied from metric to metric for the different algorithms. Sensitivity ranged from 0 to 100%, but majority were under 25% (median = 3.8% (IQR: 1.2–12.6%)) whereas specificity for most algorithms was above 99.995% (median = 99.9966% (IQR: 99.9925–99.9988)). Median PPV adjusted by literature prevalence was 3.1% (IQR: 0.7–14.6%) and by coded prevalence, 2.5% (IQR: 0.4–8.4). Median positive likelihood ratio was 1167 (IQR: 125–4006), reflecting a strong signal for disease presence, and median negative likelihood ratio was 0.96 (IQR: 0.87–0.99), reflecting limited clinical utility of a negative result.

Our findings demonstrate the potential of using routinely collected EHR data to facilitate earlier diagnosis of rare diseases. Real world evaluations are required in order to fully ascertain the impact of such case-finding algorithms in assisting with the detection and diagnosis of patients with rare diseases.

The online version contains supplementary material available at 10.1186/s13023-026-04240-6.

## Linked entities

- **Diseases:** rare diseases (MONDO:0021200)

## Full-text entities

- **Diseases:** cancer (MESH:D009369), Williams syndrome (MESH:D018980), v2 (MESH:D049932), hepatitis C (MESH:D019698), Beckwith-Wiedemann syndrome (MESH:D001506), Gaucher's disease (MESH:D005776), genetic disorder (MESH:D030342), FOP (MESH:D009221), Rare Diseases (MESH:D035583), congenital malformations (OMIM:163000), developmental delays (MESH:D002658), Alpha-1-antitrypsin deficiency (MESH:D019896), CVID (MESH:D017074), OPCRD (MESH:D014947), Peutz-Jeghers syndrome (MESH:D010580), cardiac conditions (MESH:D006331), DiGeorge syndrome (MESH:D004062), cardiovascular disease (MESH:D002318), Turner syndrome (MESH:D014424), hypophosphatasia (MESH:D007014), eosinophilic oesophagitis (MESH:D000077277), immunodeficiencies (MESH:D007153), palate abnormalities (MESH:D000014)
- **Chemicals:** phosphate (MESH:D010710), calcium (MESH:D002118)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13041464/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13041464/full.md

## References

11 references — full list in the complete paper: https://tomesphere.com/paper/PMC13041464/full.md

---
Source: https://tomesphere.com/paper/PMC13041464