# PreBP: an interpretable, optimized ensemble framework using routine complete blood count for rapid pathogen identification in bacterial pneumonia

**Authors:** Xiaoxi Hao, Dingjian Liang, Yimin Shen, Cuimin Sun, Wei Lan

PMC · DOI: 10.3389/fbinf.2025.1769816 · Frontiers in Bioinformatics · 2026-01-14

## TL;DR

This paper introduces PreBP, a machine learning framework that uses blood test data to quickly identify bacteria causing pneumonia, improving early diagnosis.

## Contribution

PreBP is a novel interpretable ensemble framework optimized for rapid pathogen identification using routine CBC data in bacterial pneumonia.

## Key findings

- PreBP achieved an AUC of 0.920, precision of 87.1%, and accuracy and sensitivity of 86.7%.
- The framework uses CBC parameters and a dual-phase feature selection strategy to identify key biomarkers.
- SHAP values provide both global and local interpretability for model predictions.

## Abstract

Bacterial pneumonia remains a major global health challenge, and early pathogen identification is important for timely and targeted treatment. However, conventional microbiological diagnostics such as sputum or blood culture are labor-intensive and time-consuming.

We propose an interpretable ensemble learning framework (PreBP) for rapid pathogen identification using routinely available complete blood count (CBC) parameters. We analyzed 1,334 CBC samples from patients with culture-confirmed bacterial pneumonia caused by four major pathogens: Pseudomonas aeruginosa, Escherichia coli, Staphylococcus aureus, and Streptococcus pneumoniae. Pathogen labels were determined based on clinical culture results. Five machine learning models (extreme gradient boosting (XGBoost), multilayer perceptron neural network (MLPNN), adaptive boosting (AdaBoost), random forest (RF), and extremely randomized trees (ExtraTrees)) were trained as comparators, and PreBP was developed with metaheuristic-optimized hyperparameters. Key CBC biomarkers were refined using a dual-phase feature selection strategy combining Lasso and Boruta. To enhance transparency, SHapley additive explanations (SHAP) were applied to provide both global biomarker importance and local, case-level explanations.

PreBP achieved the best overall performance, with an AUC of 0.920, precision of 87.1%, and accuracy and sensitivity of 86.7%.

Because the framework relies on routine CBC measurements, it can generate interpretable predictions once CBC results are available, which may provide supplementary evidence for earlier pathogen-oriented clinical decision-making alongside culture-dependent workflows. Overall, PreBP offers an interpretable and computational approach for pathogen identification in bacterial pneumonia based on routine laboratory data.

Infographic illustrating a diagnostic process and predictive model workflow. On the left, steps include specimen collection, digestion, centrifugation, media inoculation, bacterial culture, and pathogen identification. On the right, blood tests are analyzed using data input, feature selection, model analysis, and a model ensemble named PreBP, which leads to prediction. The process involves iterative model training and evaluation.

## Linked entities

- **Diseases:** bacterial pneumonia (MONDO:0004652)
- **Species:** Pseudomonas aeruginosa (taxon 287), Escherichia coli (taxon 562), Staphylococcus aureus (taxon 1280), Streptococcus pneumoniae (taxon 1313)

## Full-text entities

- **Diseases:** Bacterial pneumonia (MESH:D018410)
- **Species:** Escherichia coli (E. coli, species) [taxon 562], Homo sapiens (human, species) [taxon 9606], Pseudomonas aeruginosa (species) [taxon 287], Staphylococcus aureus (species) [taxon 1280], Streptococcus pneumoniae (species) [taxon 1313]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12847367/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12847367/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC12847367/full.md

---
Source: https://tomesphere.com/paper/PMC12847367