# Early risk stratification of sepsis-related liver injury via machine learning: a multicohort study

**Authors:** Xin Chen, Pinwen Zhou, Jiaqi Wang, Li Zhang, Tingbin Xie, Wei Ma, Xinying Wang

PMC · DOI: 10.3389/fmed.2026.1649101 · Frontiers in Medicine · 2026-01-27

## TL;DR

This study uses machine learning to predict early liver injury in sepsis patients, helping identify high-risk individuals for timely intervention.

## Contribution

The study introduces a machine learning model, particularly a random forest model, for early risk stratification of sepsis-related liver injury.

## Key findings

- The random forest model achieved high predictive performance with an ROC-AUC of 0.867 in internal validation.
- Prothrombin time and other clinical indicators were identified as key predictors of sepsis-related liver injury.
- The model showed strong external validation performance with an ROC-AUC of 0.862.

## Abstract

Sepsis-related liver injury (SRLI) is associated with poor prognosis and high morbidity in septic patients. Early mitigation of liver injury is crucial for improving outcomes in the critically ill. However, early detection and intervention remain challenging, due in part to the lack of effective diagnostic and screening strategies. This study aimed to apply machine learning (ML) approaches to identify significant predictors for the onset of SRLI, with the goal of facilitating early identification of high-risk patients.

This retrospective study utilized data from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database, divided into training and internal validation cohorts. An additional external validation cohort consisted of 120 sepsis patients from Nanjing Jinling Hospital. We constructed seven ML models and two conventional assessment scales to predict the risk of SRLI in patients who did not meet the SRLI criteria within the first 24 h of ICU admission. The Boruta algorithm was employed for feature selection. Hyperparameter tuning was performed on the training set using grid search. Model performance was evaluated by the area under the receiver operating characteristic curve (ROC-AUC) and precision–recall area under the curve (PR-AUC), along with specificity, sensitivity, accuracy, F1 score. The clinical utility of the models was evaluated using decision curve analysis. Shapley additive explanation (SHAP) was used to provide clinicians with an intuitive understanding of the machine learning model.

After applying exclusion criteria, 9,434 sepsis patients from MIMIC-IV were included for model development. The Random Forest (RF) model demonstrated superior overall predictive performance in internal validation, achieving an area under the curve of 0.867, precision–recall area under the curve of 0.392. Decision curve analysis indicated the RF model provided a positive net benefit across a wide range of high risk thresholds. In the RF model, total bilirubin, international normalized ratio, sequential organ failure assessment, logistic organ dysfunction system, and prothrombin time were the most important indexes during the initial 24 hours following ICU admission, according to SHAP value. In the external validation, the RF model also outperformed all others (ROC-AUC: 0.862, PR-AUC: 0.735).

Our study explored ML-based models for predicting SRLI among sepsis at an earlier stage and the performance of random forest model ranked best. The significant predictive contribution of prothrombin time highlights its potential as a key monitoring marker for early risk stratification in septic patients.

## Full-text entities

- **Diseases:** septic (MESH:D001170), dysfunction (MESH:D006331), failure (MESH:D051437), SRLI (MESH:D017093), critically ill (MESH:D016638), sepsis (MESH:D018805)
- **Chemicals:** bilirubin (MESH:D001663)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12886363/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12886363/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC12886363/full.md

---
Source: https://tomesphere.com/paper/PMC12886363