# An interpretable machine learning model for predicting sepsis risk in ICU patients with non-traumatic subarachnoid hemorrhage: development and validation using the MIMIC-IV database

**Authors:** Shaojie Guo, Yang Liu, Jing Xia, Ang Li, Xinchen Ma, Yong Chen, Jv Wang, Bingsha Han, Gaofeng Li, Guang Feng

PMC · DOI: 10.3389/fneur.2026.1734264 · Frontiers in Neurology · 2026-01-21

## TL;DR

This study developed a machine learning model to predict sepsis risk in ICU patients with non-traumatic subarachnoid hemorrhage, using data from the MIMIC-IV database.

## Contribution

The study introduces an interpretable machine learning model for sepsis risk prediction in a specific ICU patient population.

## Key findings

- The CATBoost model achieved an AUC of 0.887 in the test set, showing strong predictive performance.
- Pneumonia, norepinephrine use, and mechanical ventilation were identified as top features influencing sepsis risk.
- The model demonstrated excellent stability with less than 2% performance fluctuation between training and test sets.

## Abstract

This study aimed to develop and validate a machine learning (ML) prediction model for assessing the risk of sepsis in intensive care unit (ICU) patients with non-traumatic subarachnoid hemorrhage (SAH), thereby providing a reference for the early clinical identification of high risk patients.

We conducted a retrospective cohort study using data from the Medical Information Mart for Intensive Care (MIMIC-IV) database, which includes admissions between 2008 and 2022. We extracted demographic information, laboratory parameters, complications, and other clinical data. Patients were randomly divided into a training set and a test set in an 8:2 ratio. Least Absolute Shrinkage and Selection Operator regression was used to identify core predictive features. Fourteen machine learning models were constructed, including Random Forest, Gradient Boosting, Kernel-based SVM, Logistic Regression, K-Nearest Neighbors, Partial Least Squares, Boosting Method, Neural Network, Naive Bayes, Discriminant Analysis, Lasso, XGBoost, CATBoost, and LightGBM. Key evaluation metrics included sensitivity, specificity, accuracy, F1 score, Youden index, and the area under the curve (AUC). SHapley Additive exPlanations (SHAP) analysis was employed to interpret the model’s decision logic, and Decision Curve Analysis (DCA) was used to assess clinical utility.

A total of 1,052 patients with non-traumatic SAH were enrolled, with 841 assigned to the training set and 211 to the test set. Lasso regression identified 11 core predictive features, including pneumonia, norepinephrine use, mechanical ventilation, Glasgow Coma Scale (GCS) grade, and acute kidney injury (AKI). The CATBoost model demonstrated the best performance: in the training set, it achieved an AUC of 88.9%, sensitivity of 73.2%, specificity of 85.9%, and a Youden index of 0.592; in the test set, it achieved an AUC of 0.887, sensitivity of 75.5%, specificity of 82.3%, and a Youden index of 0.578. Performance fluctuation between the training and test sets was less than 2%, indicating excellent stability. SHAP analysis revealed that pneumonia, norepinephrine use, and mechanical ventilation were the top three features influencing sepsis risk, with pneumonia significantly increasing the risk. DCA results showed that the CATBoost model had the highest net benefit in the high-risk threshold range of 0.2–0.6.

The machine learning model developed based on the MIMIC-IV database can effectively predict the risk of sepsis in ICU patients with non-traumatic SAH. It demonstrates good interpretability and clinical utility, providing a basis for clinical risk stratification and precise intervention.

## Linked entities

- **Diseases:** pneumonia (MONDO:0005249), acute kidney injury (MONDO:0002492)

## Full-text entities

- **Diseases:** sepsis (MESH:D018805), SAH (MESH:D013345), AKI (MESH:D058186), pneumonia (MESH:D011014)
- **Chemicals:** norepinephrine (MESH:D009638)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12867871/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12867871/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12867871/full.md

---
Source: https://tomesphere.com/paper/PMC12867871