# Comparison of the diagnostic performance of machine learning algorithms for differentiating iron deficiency anemia and thalassemia

**Authors:** Yun Wang, Xu Yuan, Weiwei Xiao, Yanjun Lu

PMC · DOI: 10.1007/s00277-026-06894-5 · Annals of Hematology · 2026-03-04

## TL;DR

This study compares machine learning models to accurately distinguish between iron deficiency anemia and thalassemia using blood parameters.

## Contribution

A robust LightGBM-based model was developed and externally validated for differentiating iron deficiency anemia and thalassemia.

## Key findings

- The LightGBM model achieved high accuracy (98.5%) in external validation for differentiating IDA and TT.
- MCHC and RDW-SD were identified as the most important predictors for discrimination.
- The model offers a non-invasive decision-support tool for clinicians.

## Abstract

Accurate discrimination between iron deficiency anemia (IDA) and thalassemia trait (TT) is clinically essential for the effective management of patients with hypochromic microcytic anemia. Although numerous discrimination indices based on red blood cell (RBC) parameters have been proposed, their diagnostic accuracy remains suboptimal and highly population-specific. This study aimed to develop and validate a machine learning model to enhance discriminative performance. We utilized a derivation cohort of 376 patients (IDA, n = 186; TT, n = 190) for model development and internal validation, and a separate validation cohort of 196 patients for external testing. Five machine learning algorithms—Extreme Gradient Boosting (XGBoost), Logistic Regression (LR), Light Gradient Boosting Machine (LightGBM), Random Forest (RF), and AdaBoost—were trained and evaluated. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The LightGBM classifier achieved an AUC of 0.953 (accuracy 86.9%) in the internal validation set and 0.980 (accuracy 93.1%) in testing set. In the external validation cohort, the model demonstrated robust generalizability, attaining an AUC of 0.992 with an accuracy of 98.5%, sensitivity of 96.7%. Feature importance analysis identified mean corpuscular hemoglobin concentration (MCHC) and red cell distribution width-standard deviation (RDW-SD)mean as the most discriminative predictors. We developed and externally validated a robust LightGBM-based classifier that accurately discriminates between IDA and TT, offering clinicians with with a reliable, non-invasive decision-support tool for the differential diagnosis of microcytic anemia.

The online version contains supplementary material available at 10.1007/s00277-026-06894-5.

## Linked entities

- **Diseases:** iron deficiency anemia (MONDO:0001356), hypochromic microcytic anemia (MONDO:0000387)

## Full-text entities

- **Genes:** HBB (hemoglobin subunit beta) [NCBI Gene 3043] {aka CD113t-C, ECYT6, beta-globin}, ITGA2B (integrin subunit alpha 2b) [NCBI Gene 3674] {aka BDPLT16, BDPLT2, CD41, CD41B, FMAIT2, GP2B}, CD27 (CD27 molecule) [NCBI Gene 939] {aka S152, S152. LPFS2, T14, TNFRSF7, Tp55}, BCS1L (BCS1 ubiquinol-cytochrome c reductase complex chaperone) [NCBI Gene 617] {aka BCS, BCS1, BJS, FLNMS, GRACILE, Hs.6719}, HBA2 (hemoglobin subunit alpha 2) [NCBI Gene 3040] {aka ECYT7, HBA-T2, HBH}, SPN (sialophorin) [NCBI Gene 6693] {aka CD43, GALGP, GPL115, LEU-22, LSN}, CD14 (CD14 molecule) [NCBI Gene 929], PECAM1 (platelet and endothelial cell adhesion molecule 1) [NCBI Gene 5175] {aka CD31, CD31/EndoCAM, GPIIA', PECA1, PECAM-1, endoCAM}, TFRC (transferrin receptor) [NCBI Gene 7037] {aka CD71, IMD46, T9, TFR, TFR1, TR}, TF (transferrin) [NCBI Gene 7018] {aka HEL-S-71p, PRO1557, PRO2086, TFQTL1}, ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}
- **Diseases:** hypochromic microcytic anemia (MESH:C536357), IDA (MESH:D018798), TT (MESH:D013789), iron deficiency (MESH:D000090463), thyroid dysfunction (MESH:D013959), organ damage (MESH:D000092124), ML (MESH:D007859), depleted (MESH:C536350), alpha-Thalassemia (MESH:D017085), megaloblastic anemia (MESH:D000749), beta-Thalassemia (MESH:D017086), inherited hemoglobinopathy (MESH:D006453), hepatic, cardiac, and endocrine dysfunction (MESH:D004700), deletion (MESH:D002872), blood loss (MESH:D016063), Anemia (MESH:D000740), inflammatory disorders (MESH:D007249), iron overload (MESH:D019190), hematological disorder (MESH:D006402)
- **Chemicals:** folate (MESH:D005492), Iron (MESH:D007501), vitamin B12 (MESH:D014805)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** c.377T > C, ATG > AGG, A > T, IVS-I-1 (G > T), G > C, c.-82 C > A, c.316-197 C > T, g.223300_227103del, c.-79 A > G, AUC of 0, g.215400_234700del, IVS-II-654 (C > T), c.-80T > C, p.Glu26Lys, c.2T > G, c.94delC, c.-10_-7delAAAC, CD26 (G > A), -29 (A > G), c.126_129delCTTT, c.427T > C, c.-78 A > G, -32 (C > A), g.219817_223755del, c.79G > A, c.45_46insG, c.84_85insC, c.216_217insA, -30 (T > C), c.130G > T, G > T, -28 (A > G)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12960304/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12960304/full.md

---
Source: https://tomesphere.com/paper/PMC12960304