# Comparison and validation of multiple machine learning algorithms for predicting MDRO infection in catheter-related bloodstream patients: a multicenter cohort study

**Authors:** Hongwei Wang, Caizheng Yang, Ming Zhao, Fen Ren, Xueyu Wang, Haihua Yan, Weiwei Qin, Fangying Tian, Linping Li

PMC · DOI: 10.1128/spectrum.03713-25 · Microbiology Spectrum · 2026-02-11

## TL;DR

This study develops and validates a machine learning model to predict multidrug-resistant infections in catheter-related bloodstream infections, aiming to improve early treatment decisions.

## Contribution

The novel contribution is an interpretable machine learning model for early prediction of MDR-CRBSI using clinical variables, validated across multiple centers.

## Key findings

- XGBoost achieved an AUC of 0.877 in training and 0.851 in external validation for predicting MDR-CRBSI.
- Key predictors included RDW, CRP, platelet count, pH, hospital stay, and antibiotic class.
- The model supports early clinical intervention and antimicrobial stewardship.

## Abstract

Early identification of patients at high risk for multidrug-resistant organism (MDRO) infection in catheter-related bloodstream infection (CRBSI) is crucial for precise antimicrobial therapy. This study aimed to develop and externally validate a machine learning (ML) model to predict this risk, thereby supporting early clinical intervention. Patients with CRBSI were extracted from the Medical Information Mart for Intensive Care IV database and classified into MDRO and non-MDRO groups based on microbiological culture and antimicrobial susceptibility testing. Missing data from 51 clinical variables were handled using Random Forest-based multiple imputation. Ten predictive features were selected by integrating correlation heatmap analysis, variance inflation factor, and least absolute shrinkage and selection operator regression. Eight ML models, including XGBoost and Random Forest, were constructed and tuned via hyperparameter optimization. The optimal model was selected primarily using the area under the receiver operating characteristic curve (AUC), supplemented by the F1-score, Brier score, accuracy, and recall. Its performance was further evaluated using a confusion matrix and calibration curve. External validation was performed on a real-world multi-center cohort (n = 362) to assess generalizability. Model interpretability was analyzed using SHapley Additive exPlanations (SHAP). A total of 1,251 patients with CRBSI were enrolled in the development cohort, among whom 189 (15.1%) were diagnosed with MDR-CRBSI. Significant differences were observed between the two groups in indicators of inflammatory status and organ functional reserve (P < 0.05). Ten predictive features were identified using least absolute shrinkage and selection operator (LASSO) regression. Among the models evaluated, XGBoost exhibited the best performance in the training set, with an AUC of 0.877 (95% CI: 0.854–0.900), and also demonstrated favorable results in other evaluation metrics. The model maintained robust predictive ability in the external multicenter validation cohort, achieving an AUC of 0.851 (95% CI: 0.826–0.876). SHAP analysis revealed that red blood cell distribution width (RDW), C-reactive protein (CRP), platelet count, pH, length of hospital stay, and class of antibiotics used as key predictors of MDR-CRBSI. Among the eight ML models developed and validated, XGBoost demonstrated superior performance in both internal and external validation. Its predictive capability is driven by 10 key variables, such as RDW and CRP, enabling early identification of high-risk MDR-CRBSI patients and providing a valuable tool for guiding precise antimicrobial therapy.

Catheter-related bloodstream infection (CRBSI) complicated by multidrug-resistant organism (MDRO) is associated with high mortality and treatment failure. The critical delay in conventional microbiological diagnosis often necessitates empirical broad-spectrum antibiotics, exacerbating antimicrobial resistance. Our study develops and validates an interpretable machine learning model using readily available clinical variables to accurately predict the risk of MDR-CRBSI at an early stage. This tool addresses a pressing clinical need by enabling timely, targeted antimicrobial therapy, thereby potentially improving patient outcomes and supporting antimicrobial stewardship efforts in the global fight against resistance.

## Full-text entities

- **Genes:** CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}
- **Diseases:** infection (MESH:D007239), CRBSI (MESH:D055499), bloodstream infection (MESH:D018805), MDR (MESH:D018088), inflammatory (MESH:D007249)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12955437/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12955437/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/PMC12955437/full.md

---
Source: https://tomesphere.com/paper/PMC12955437