# Machine Learning Algorithms to Predict Venous Thromboembolism in Patients With Sepsis in the Intensive Care Unit: Multicenter Retrospective Study

**Authors:** Yan Zhang, Xia Ren, Luojie Liu, Junjie Zha, Yijie Gu, Hongwei Ye

PMC · DOI: 10.2196/80969 · JMIR Medical Informatics · 2026-01-30

## TL;DR

This study created a machine learning model to predict blood clots in ICU patients with sepsis, showing strong performance and interpretability.

## Contribution

A novel interpretable machine learning model for VTE prediction in sepsis patients, validated across multiple centers and sepsis severity levels.

## Key findings

- The light gradient boosting machine model achieved an AUC of 0.956 in internal validation and 0.786 in external validation.
- The model showed enhanced performance in severe sepsis subgroups with an AUC of 0.816.
- SHAP analysis identified key predictors like central venous catheterization and lab values influencing VTE risk.

## Abstract

Venous thromboembolism (VTE) is a common and severe complication in intensive care unit (ICU) patients with sepsis. Conventional risk stratification tools lack sepsis-specific features and may inadequately capture complex, nonlinear interactions among clinical variables.

This study aimed to develop and validate an interpretable machine learning (ML) model for the early prediction of VTE in ICU patients with sepsis.

This multicenter retrospective study used data from the Medical Information Mart for Intensive Care IV database for model development and internal validation, and an independent cohort from Changshu Hospital for external validation. Candidate predictors were selected through univariate analysis, followed by least absolute shrinkage and selection operator regression. Retained variables were used in multivariable logistic regression to identify independent predictors, which were then used to develop 9 ML models, including categorical boosting, decision tree, k-nearest neighbor, light gradient boosting machine, logistic regression, multilayer perceptron, naive Bayes, random forest, and support vector machine. Performance was evaluated by discrimination (area under the curve [AUC]), calibration, and clinical use (decision curve analysis). A subgroup analysis stratified by the Sequential Organ Failure Assessment score was conducted in the external cohort to assess model stability across sepsis severity levels. Model interpretability was assessed using Shapley Additive Explanations (SHAP) to quantify the contribution of features to the predicted risk.

A total of 25,197 patients from the Medical Information Mart for Intensive Care IV cohort and 328 patients from the external cohort were included, with VTE incidences of 844 out of 25,197 (3.4%) and 30 out of 328 (9.2%), respectively. The light gradient boosting machine model performed best, achieving an AUC of 0.956 in internal validation. Despite the higher VTE incidence and clinical severity in the external validation, the model maintained robust generalization with an AUC of 0.786. Notably, the model achieved enhanced discriminative ability in the severe sepsis subgroup (Sequential Organ Failure Assessment score >6) with an AUC of 0.816, compared with 0.769 in the mild to moderate sepsis subgroup. Calibration curves indicated strong agreement between predicted and observed outcomes, and decision curve analysis showed superior net benefit across clinically relevant thresholds. SHAP analysis identified central venous catheterization, serum chloride and bicarbonate levels, arterial catheterization, and prolonged partial thromboplastin time as the most influential predictors. Partial dependence plots revealed both linear and nonlinear associations between these variables and VTE risk. Individual-level force plots further enhanced interpretability by visualizing personalized risk profiles.

We developed a high-performing and interpretable ML model for predicting VTE in ICU patients with sepsis. The model demonstrated robustness across cohorts and enhanced performance in the severe sepsis population. By integrating diverse clinical data and leveraging SHAP for transparent explanations, this tool may support personalized prophylaxis and early diagnostic strategies.

## Linked entities

- **Diseases:** Venous thromboembolism (MONDO:0005399)

## Full-text entities

- **Diseases:** VTE (MESH:D054556), Sepsis (MESH:D018805), Sequential Organ Failure (MESH:D009102)
- **Chemicals:** chloride (MESH:D002712), bicarbonate (MESH:D001639)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12905564/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12905564/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12905564/full.md

---
Source: https://tomesphere.com/paper/PMC12905564