# Prediction of bacteremia using routine hematological and metabolic parameters based on logistic regression and random forest models

**Authors:** Ting-Qiang Wang, Ying Zhuo, Chun-E Lv, Jing Shi, Ling-Hui Yao, Shi-Yan Zhang, Jinbao Shi

PMC · DOI: 10.3389/fcimb.2025.1605485 · Frontiers in Cellular and Infection Microbiology · 2025-07-28

## TL;DR

This study shows that machine learning models using routine blood tests can predict bacteremia, with random forest performing slightly better than logistic regression.

## Contribution

The study compares logistic regression and random forest models for predicting bacteremia using routine lab data.

## Key findings

- Random forest had higher sensitivity (recall rate of 0.69) than logistic regression (0.60) for predicting bacteremia.
- Platelet count, PCT, triglycerides, and low cholesterol were identified as independent risk factors for bacteremia.
- Both models achieved similar AUCs (0.75 for random forest and 0.74 for logistic regression).

## Abstract

This study aimed to evaluate the predictive utility of routine hematological, inflammatory, and metabolic markers for bacteremia and to compare the classification performance of logistic regression and random forest models.

A retrospective study was conducted on 287 inpatients who underwent blood culture testing at Fuding Hospital, Fujian University of Traditional Chinese Medicine between March and August 2024. Patients were divided into bacteremia (n = 137) and non-bacteremia (n = 150) groups based on blood culture results. Hematological indices, inflammatory markers (e.g., C-reactive protein (CRP), procalcitonin (PCT)), metabolic indices (e.g., glucose, cholesterol) and nutritional markers (e.g., albumin) were analyzed. Univariate and multivariate binary logistic regression analyses were used to identify independent risk factors. Logistic regression and random forest models were developed using 33 features with a 70:30 train-test split and evaluated using the receiver operating characteristic (ROC) curves, confusion matrices and standard classification.

Hemoglobin, cholesterol, and albumin levels were significantly lower in the bacteremia group, while platelet count, CRP, PCT, glucose, and triglycerides were significantly elevated (all p < 0.05). Logistic regression identified platelet count (Odds ratios (OR) = 1.003, 95% confidence interval (CI): 1.001–1.006), PCT (OR = 1.032, 95% CI: 1.004–1.060), triglycerides (OR = 1.740, 95% CI: 1.052–2.879), and low cholesterol (OR = 0.523, 95% CI: 0.383–0.714) as independent risk factors. The area under the ROC curve (AUC) was 0.75 for the random forest model and 0.74 for logistic regression, with recall rates of 0.69 and 0.60, respectively.

Routine laboratory markers integrated into machine learning models demonstrated potential for early bacteremia prediction. Random forest exhibited superior sensitivity compared to logistic regression, suggesting its potential utility as a clinical screening tool.

## Linked entities

- **Chemicals:** procalcitonin (PubChem CID 71452493), glucose (PubChem CID 5793), cholesterol (PubChem CID 5997)
- **Diseases:** bacteremia (MONDO:0005229)

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}
- **Diseases:** bacteremia (MESH:D016470), inflammatory (MESH:D007249)
- **Chemicals:** triglycerides (MESH:D014280), glucose (MESH:D005947), cholesterol (MESH:D002784)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12336153/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12336153/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC12336153/full.md

---
Source: https://tomesphere.com/paper/PMC12336153