# Development of an interpretable machine learning model for predicting venous thromboembolism in intensive care unit patients with intracerebral hemorrhage

**Authors:** Menghui He, Wenyan Liu, Zhongsheng Lu, Yiwei Lv, Qiang Zhang, Xiaoqing Jin, Pei Han

PMC · DOI: 10.3389/fneur.2025.1691549 · Frontiers in Neurology · 2026-01-07

## TL;DR

This study creates a machine learning model to predict blood clots in ICU patients with brain hemorrhages, helping doctors decide on personalized prevention strategies.

## Contribution

An interpretable XGBoost model with SHAP explanations for predicting VTE in ICU ICH patients.

## Key findings

- The XGBoost model achieved AUC values of 0.936, 0.778, and 0.761 in training, test, and external validation sets.
- Top influential features included ICU stay duration, age, prothrombin time, and triglycerides.

## Abstract

Venous thromboembolism (VTE) is a frequent and potentially life-threatening complication in patients with intracerebral hemorrhage (ICH) in intensive care units (ICU). However, the necessity of prophylactic anticoagulation therapy for these patients remains controversial. This study aims to develop an interpretable machine learning (ML) model to accurately predict the risk of VTE in critically ill ICH patients, thereby enabling timely and individualized preventive measures.

A retrospective analysis was performed on clinical data from the MIMIC-IV database and ICU patients diagnosed with ICH at Qinghai Provincial People’s Hospital. After data preprocessing, 1,545 cases from the MIMIC-IV database were randomly divided into a training set (1,097 cases) and a test set (448 cases) in a 7:3 ratio. Data from 151 ICH patients treated in the ICU of Qinghai Provincial People’s Hospital between January 2020 and December 2024 were utilized as an external validation set. The Least Absolute Shrinkage and Selection Operator (LASSO) algorithm was applied for feature selection. Model performance was assessed using metrics including the area under the curve (AUC), decision curve analysis (DCA), accuracy, positive predictive value (PPV), and negative predictive value (NPV). The optimal model was further explained using the SHapley Additive exPlanations (SHAP) method.

The XGBoost model exhibited the best predictive performance, with AUC values of 0.936, 0.778, and 0.761 for the training set, test set, and external validation set, respectively. Feature importance analysis identified the top 10 influential features as follows: ICU stay duration, age, prothrombin time, triglycerides, albumin, body mass index, partial thromboplastin time, blood glucose, white blood cell count, and systolic blood pressure.

The XGBoost model accurately predicts VTE occurrence in ICH patients in the ICU. By employing the SHAP method, it is possible to precisely assess the impact of various pathophysiological parameters on individual patient predictions, thereby providing robust support for personalized risk stratification and preventive treatment.

## Linked entities

- **Diseases:** intracerebral hemorrhage (MONDO:0013792), venous thromboembolism (MONDO:0005399)

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}
- **Diseases:** critically ill (MESH:D016638), VTE (MESH:D054556), ICH (MESH:D002543)
- **Chemicals:** triglycerides (MESH:D014280), glucose (MESH:D005947)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12819677/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12819677/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC12819677/full.md

---
Source: https://tomesphere.com/paper/PMC12819677