# Machine learning-based prediction model for bronchoalveolar lavage efficacy in severe pneumonia

**Authors:** Ning Zhang, Lin Cui, Xiaojun Zhang

PMC · DOI: 10.3389/fmed.2025.1710355 · Frontiers in Medicine · 2026-01-07

## TL;DR

This study developed a machine learning model to predict whether bronchoalveolar lavage will be effective in treating severe pneumonia based on patient data.

## Contribution

The novel contribution is a validated random forest model predicting BAL efficacy using clinical and inflammatory markers in severe pneumonia patients.

## Key findings

- Age ≥60, COPD, elevated PCT and CRP, and abnormal blood gas levels were identified as risk factors for poor BAL efficacy.
- The random forest model outperformed KNN and gradient boosting models with AUC values of 0.799 and 0.778 in training and validation sets.
- The model provides a basis for future clinical prediction and hypothesis testing but requires external validation.

## Abstract

The objective of the study was to construct and validate a predictive model for the clinical efficacy of bronchoalveolar lavage (BAL) with fiberoptic bronchoscopy in patients with community-acquired severe pneumonia based on inflammatory response indicators and blood gas analysis results.

A total of 206 patients with severe pneumonia who underwent BAL treatment in our hospital from November 2020 to November 2024 were enrolled and randomly divided into a training set (n = 144) and a validation set (n = 62) in a 7:3 ratio. In the training set, efficacy-related indicators were screened using univariate analysis. After variable selection via a least absolute shrinkage and selection operator (LASSO) regression analysis, independent factors influencing poor efficacy were determined using multivariate logistic regression analysis. Random forest (RF), K-nearest neighbor algorithm (KNN), and gradient boosting (GB) models were constructed using Python. The performance of the models was evaluated by the area under (AUC) the receiver operating characteristic curve (ROC), and the optimal model was selected.

There were 32 cases (22.22%) and 13 cases (20.97%) with poor clinical efficacy in the training set and the validation set, respectively. In the training set, the multivariate logistic regression analysis showed that age ≥ 60 years, comorbid chronic obstructive pulmonary disease (COPD), procalcitonin (PCT) ≥ 2 ng/mL, C-reactive protein (CRP) ≥ 100 mg/L, partial pressure of arterial oxygen (PaO₂) < 60 mmHg, and partial pressure of arterial carbon dioxide (PaCO₂) ≥ 50 mmHg were independent risk factors for poor efficacy (all p < 0.05). The AUC values of the RF model (0.799 in the training set and 0.778 in the validation set) were significantly higher than those of the KNN (0.759, 0.721) and the GB (0.766, 0.738) models, making it the optimal predictive model.

An RF-based predictive model was successfully developed and validated to predict the efficacy of BAL treatment in patients with severe pneumonia. This model effectively identifies and quantifies key factors influencing BAL treatment outcomes in severe pneumonia, providing basic data support for subsequent hypothesis testing and clinical efficacy prediction research. However, as this is a single-center, hypothesis-generating study, the application of the model in individualized treatment planning still requires multicenter external validation and further analysis including a non-BAL control group.

## Linked entities

- **Diseases:** chronic obstructive pulmonary disease (COPD) (MONDO:0005002)

## Full-text entities

- **Genes:** CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}
- **Diseases:** pneumonia (MESH:D011014), inflammatory (MESH:D007249), COPD (MESH:D029424)
- **Chemicals:** carbon dioxide (MESH:D002245), oxygen (MESH:D010100), PaCO2 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12819250/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12819250/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC12819250/full.md

---
Source: https://tomesphere.com/paper/PMC12819250