# XGBoost outperforms other machine learning models in diagnosing Sepsis-Associated Thrombocytopenia: a multicenter retrospective study

**Authors:** Busra Emir, Evrim Ozmen, Sukriye Miray Kilincer Bozgul, Caner Acar, Nur Soyer, Devrim Bozkurt, Kamil Gonderen, Mehmet Göktuğ Efgan

PMC · DOI: 10.3389/fmed.2026.1715551 · Frontiers in Medicine · 2026-02-09

## TL;DR

XGBoost was found to be the most effective machine learning model for diagnosing sepsis-associated thrombocytopenia in a study of over 1,400 patients.

## Contribution

This study demonstrates that XGBoost outperforms other machine learning models in diagnosing sepsis-associated thrombocytopenia.

## Key findings

- XGBoost achieved the highest accuracy (91.10%) and F1 score (92.31%) in cross-validation.
- XGBoost's AUROC was 98.60% in cross-validation and 97.50% in the test set, the highest among all models.
- ANN and RF also showed strong performance, with AUROC values exceeding 90%.

## Abstract

Sepsis-associated thrombocytopenia is a frequent complication of sepsis and is associated with poor clinical outcomes. Accurate diagnosis remains challenging, and machine learning approaches may offer improved diagnostic performance. The objective of this study was to conduct a comparative assessment of the performance of machine learning models (Random Forest, Artificial Neural Networks, Extreme Gradient Boosting, and Naive Bayes) in diagnosing sepsis-associated thrombocytopenia and to evaluate their diagnostic performance based on predefined performance criteria.

This retrospective cross-sectional study was conducted at two centers and utilized data from 1,447 sepsis patients extracted from electronic health records between January 2013 and December 2023. The dataset comprised demographic and clinical attributes together with laboratory test results. The data were partitioned into training and test sets in an 80:20 ratio. All models were trained on the training set, and 10-fold cross-validation was applied within the training set to assess internal performance consistency. Model performance was evaluated using accuracy, precision, F1 score, AUROC, and confusion matrix. The ten most significant variables were ranked using classifier-based feature importance and SHAP analysis.

Thrombocytopenia occurred in 772 (53.4%) of the 1,447 sepsis patients. Cross-validation results indicated that XGBoost exhibited superior performance, achieving an accuracy of 91.10% and an F1 score of 92.31%. ANN and RF achieved accuracies of 90.32% and 83.75%, with F1 scores of 91.51% and 85.32%, respectively. The AUROC values for cross-validation and test sets of RF, ANN, XGBoost, and Naive Bayes models exceeded 90%.

Among the evaluated algorithms, the XGBoost model demonstrated superior performance, achieving an AUROC of 98.60% in cross-validation and 97.50% in the test set, indicating the strongest performance metrics within internal validation.

## Linked entities

- **Diseases:** thrombocytopenia (MONDO:0002049)

## Full-text entities

- **Genes:** CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}, SLC17A5 (solute carrier family 17 member 5) [NCBI Gene 26503] {aka AST, ISSD, NSD, SD, SIALIN, SIASD}, SAT1 (spermidine/spermine N1-acetyltransferase 1) [NCBI Gene 6303] {aka DC21, KFSD, KFSDX, SAT, SSAT, SSAT-1}, ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, FGB (fibrinogen beta chain) [NCBI Gene 2244] {aka HEL-S-78p}
- **Diseases:** death (MESH:D003643), hypertension (MESH:D006973), fatalities (MESH:C565541), Thrombocytopenia (MESH:D013921), coagulation (MESH:D001778), infection (MESH:D007239), urinary tract infections (MESH:D014552), kidney disease (MESH:D007674), immune thrombocytopenia (MESH:D016553), septic (MESH:D001170), hyperbilirubinemia (MESH:D006932), Sepsis (MESH:D018805), critically ill (MESH:D016638), abnormalities in platelet production (MESH:D001791), inflammation (MESH:D007249), cancer (MESH:D009369), Sepsis-associated thrombocytopenia (MESH:D065166), diabetes (MESH:D003920), failure (MESH:D051437), chronic kidney disease (MESH:D051436), hemorrhagic complications (MESH:D006470), Organ Failure (MESH:D009102), hematologic malignancy (MESH:D019337), hypotension (MESH:D007022), hypoxemia (MESH:D000860), bone marrow failure (MESH:D000080983)
- **Chemicals:** NA (MESH:D012964), urea nitrogen (MESH:C530477), heparin (MESH:D006493), CRE (MESH:D003404), bilirubin (MESH:D001663), lactate (MESH:D019344)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12926416/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12926416/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/PMC12926416/full.md

---
Source: https://tomesphere.com/paper/PMC12926416