# Machine Learning Model for Sepsis Prediction in Prolonged and Chronic Critical Illness: Development and Validation Using Retrospective Real-World ICU Data

**Authors:** Mikhail Ya. Yadgarov, Olga Yu. Rebrova, Levan B. Berikashvili, Petr A. Polyakov, Kristina K. Kadantseva, Alexey A. Yakovlev, Andrey V. Grechko, Valery V. Likhvantsev

PMC · DOI: 10.3390/jcm15020777 · 2026-01-18

## TL;DR

This study develops a machine learning model to predict sepsis in patients with prolonged or chronic critical illness, finding it works well in its specific group but fails in broader ICU populations.

## Contribution

The novel contribution is developing and validating a sepsis prediction model specifically for prolonged/chronic critical illness patients using real-world ICU data.

## Key findings

- A PCI/CCI-focused XGBoost model achieved an AUROC of 0.82 in its cohort but only 0.47 in external populations.
- A universal model trained on mixed data showed reduced discrimination in PCI/CCI patients.
- Respiratory rate, heart rate, body temperature, and age were key features for prediction.

## Abstract

Background: No machine learning (ML) models for sepsis prediction have been specifically developed for patients with prolonged or chronic critical illness (PCI/CCI). Objective: This study aimed to develop and validate an ML-based sepsis prediction model for this cohort. Methods: We analyzed ICU admissions from the Russian Intensive Care Dataset (RICD, 575 patients with PCI/CCI) and two public ICU datasets from the PhysioNet (>40,000 patients with acute critical illness). Models were trained within a right-aligned prediction framework using a case–crossover–control sampling approach and a 6 h prediction window. Two strategies were evaluated: (1) a PCI/CCI-focused model trained on RICD with external testing on PhysioNet data and (2) a universal model trained on combined RICD and PhysioNet cohorts. Models were developed with tree-based algorithms (XGBoost, LightGBM, Random Forest, AdaBoost), with internal and external validation. Primary outcome was model discrimination (AUROC). Subgroup analyses were performed for sepsis phenotypes. Results: The PCI/CCI-focused XGBoost model achieved an AUROC of 0.82 in the RICD cohort but failed to generalize to external ICU populations (AUROC 0.47). A universal model trained on mixed data demonstrated reduced discrimination in PCI/CCI patients (AUROC mean difference 0.02, p = 0.0012). Respiratory rate, heart rate, body temperature, and age were among the most important features. Predictive performance was higher in hypoinflammatory sepsis phenotype (AUROC 0.84 vs. 0.81 for hyperinflammatory, p < 0.001). Despite worthless positive predictive value (up to 21%) for PCI/CCI-focused model, negative predictive value exceeded 97%. Conclusions: A right-aligned ML model tailored to PCI/CCI demonstrated strong internal performance for sepsis exclusion but limited cross-population generalizability, underscoring the need for population-specific prediction models and prospective validation before clinical application.

## Full-text entities

- **Diseases:** Sepsis (MESH:D018805), Critical Illness (MESH:D016638)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12841784/full.md

---
Source: https://tomesphere.com/paper/PMC12841784