# Exploration of the correlation between clinical indicators and prognosis in hospitalized children with pneumonia and construction of a risk prediction model based on machine learning algorithms

**Authors:** Jin Xue, Guangzhong He, Qiaoying Chen

PMC · DOI: 10.3389/fmed.2026.1747935 · 2026-01-28

## TL;DR

This study uses machine learning to predict which children with pneumonia are at higher risk of poor outcomes, such as longer hospital stays or ICU admission.

## Contribution

A novel XGBoost-based risk prediction model for childhood pneumonia prognosis using clinical indicators and machine learning.

## Key findings

- The XGBoost model achieved an AUC of 0.84 in predicting adverse outcomes in children with pneumonia.
- Admission PCT, CRP, and respiratory rate were identified as the top predictors of poor prognosis.
- The model's performance was robust even when excluding cases of COVID-19.

## Abstract

Childhood pneumonia is a leading cause of hospitalization and death in children under 5 years globally. Its prognosis varies individually and is affected by multiple clinical indicators, while traditional assessment lacks quantitative risk stratification tools. Machine learning (ML) enables comprehensive analysis of high-dimensional clinical data, making it valuable for identifying key prognostic factors and building robust prediction models to optimize clinical decision-making.

A total of 582 hospitalized children (1 month–5 years) with community-acquired pneumonia were retrospectively enrolled (January 2022–June 2025). Demographic, laboratory (WBC, CRP, PCT, LYM%, serum albumin), vital sign, and underlying disease data were collected. Adverse prognosis was defined as a composite of prolonged hospitalization (>7 days), PICU admission, or in-hospital death. Patients were randomly split into training (n = 407) and validation (n = 175) sets (7:3). XGBoost, Random Forest (RF), and Logistic Regression (LR) models were constructed, with performance evaluated by AUC, accuracy, sensitivity, and specificity. Class imbalance was addressed using stratified random sampling during dataset splitting to maintain consistent adverse prognosis rates between training and validation sets. SHAP values analyzed indicator importance. Missing data (all < 5%) were imputed via mean imputation; a sensitivity analysis comparing mean imputation with multiple imputation confirmed no significant impact on model performance.

Adverse prognosis occurred in 121 (20.8%) children. The XGBoost model outperformed RF and LR, with validation-set AUC 0.84 (95% CI: 0.78∼0.90), accuracy 81.1%, sensitivity 78.6%, and specificity 82.3%. Model calibration was verified via Hosmer-Lemeshow test (p = 0.312), indicating good agreement between predicted and observed risks. Top 5 key indicators were admission PCT, CRP, respiratory rate, age < 6 months, and blood oxygen saturation. PCT > 2 ng/mL (OR = 3.95) and CRP > 40 mg/L (OR = 3.52) significantly increased adverse prognosis risk. Etiological data (viral, bacterial, mixed infection) were unavailable in 41.2% (240/582) of cases; among available data (342/582), 58.5% (200/342) were viral (including 12 cases of COVID-19), 32.2% (110/342) bacterial, and 9.3% (32/342) mixed infections. Sensitivity analysis excluding COVID-19 cases (n = 12) showed no substantial change in model performance (AUC = 0.83, 95% CI: 0.77∼0.89).

The XGBoost-based model effectively identifies high-risk children with pneumonia, with PCT, CRP, and respiratory rate as key predictors. It provides a practical tool for clinical risk stratification and personalized management. The model’s cutoffs for PCT (>2 ng/mL) and CRP (>40 mg/L) align with existing pediatric pneumonia predictive scores (e.g., PRIEST score) but offer improved discriminative power by integrating multi-dimensional indicators and ML-driven interactions.

## Linked entities

- **Diseases:** pneumonia (MONDO:0005249), COVID-19 (MONDO:0100096)

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}, CALCA (calcitonin related polypeptide alpha) [NCBI Gene 796] {aka CALC1, CGRP, CGRP-I, CGRP-alpha, CGRP1, CT}
- **Diseases:** infection (MESH:D007239), death (MESH:D003643), COVID-19 (MESH:D000086382), pneumonia (MESH:D011014)
- **Chemicals:** oxygen (MESH:D010100)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12891092/full.md

---
Source: https://tomesphere.com/paper/PMC12891092