# Machine Learning‐Based Detection of HbS and HbC Carriers in the UK General Population

**Authors:** Frederik Christensen, Deniz Kenan Kılıç, Alexander Djupnes Fuglkjær, Jesper Petersen, Tarec Christoffer El‐Galaly, Andreas Glenthøj, Jens Helby, Izabela Ewa Nielsen

PMC · DOI: 10.1002/jha2.70170 · EJHaem · 2025-11-04

## TL;DR

This study uses machine learning to detect carriers of HbS and HbC using routine blood tests, aiming to improve cost-effective screening in the UK population.

## Contribution

The novel use of machine learning models with routine blood tests to detect HbS and HbC carriers in a general population.

## Key findings

- Three ML models achieved high ROC-AUC scores (0.943–0.956) for detecting HbS and HbC carriers.
- At 95% sensitivity, specificities ranged from 76% to 78% across models.
- Model performance dropped significantly when restricted to Black individuals.

## Abstract

Haemoglobin S (HbS) and C (HbC) are the most important sickling variants on the African continent, imposing major health burdens. Early detection of carrier status is crucial but often hindered by resource limitations.

To develop machine learning (ML) models to accurately classify HbS and HbC carriers using readily available routine blood tests, facilitating cost‐effective mass screening.

We utilised demographic and routine blood parameters from 469,248 individuals from the UK general population, including 1635 individuals with HbS and/or HbC variants identified by whole exome sequencing, to develop ML models for carrier detection based on standard blood tests. Three ML models (Logistic Regression [LR], Random Forest [RF] and XGBoost [XGB]) were trained using 32 different standard blood test results.

All models demonstrated high discriminatory ability (ROC‐AUC: LR 0.951; RF 0.943; XGB 0.956) in the UK general population. At a sensitivity of 95%, specificities were 77% (LR), 76% (RF) and 78% (XGB). SHAP analysis revealed consistent key features across models. When use was restricted to black individuals, performance fell considerably.

ML models based on routine blood tests effectively identify HbS and HbC carriers in a mixed general population. This approach has the potential to enhance screening efficiency by reducing reliance on specialised techniques.

## Linked entities

- **Diseases:** sickle cell disease (MONDO:0011382)

## Full-text entities

- **Genes:** KRT88P (keratin 88, pseudogene) [NCBI Gene 85348] {aka HBC, KRT122P, KRTHBP3}

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12584039/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12584039/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12584039/full.md

---
Source: https://tomesphere.com/paper/PMC12584039