# Machine learning-based risk of pulmonary embolism in stroke patients with lower extremity deep vein thrombosis construction and validation of a prediction model

**Authors:** Li Wu, Luo Yefangxin, Rong Liu, Wei Chen, Wanting Shi, Qiong Qin, Darong Lu, Jiexin Sheng

PMC · DOI: 10.3389/fneur.2026.1710381 · Frontiers in Neurology · 2026-02-26

## TL;DR

This study uses machine learning to predict the risk of pulmonary embolism in stroke patients with deep vein thrombosis, aiming to improve early detection and outcomes.

## Contribution

A novel machine learning model (RFC) is developed and validated for predicting pulmonary embolism risk in stroke patients with DVT.

## Key findings

- The RFC model achieved an AUC of 0.77 and high sensitivity (0.918) in predicting pulmonary embolism.
- Key predictors included oxygen partial pressure, hypertension history, and D-dimer levels.

## Abstract

Up to 42% of stroke patients are susceptible to lower extremity deep vein thrombosis (DVT). The dislodgment of thrombus in deep veins of stroke patients can develop into fatal pulmonary embolism (PE), which has insidious onset and high mortality rate, and the risk factors of PE in stroke DVT are not yet known by clinical staff, which makes it easy to be underdiagnosed and misdiagnosed. In addition, routine CT pulmonary angiography (CTPA) cannot be performed for screening. In this study, machine learning technology was utilized to establish a fast and accurate screening model for pulmonary embolism in patients with lower extremity deep vein thrombosis in stroke.

In this study, all patients admitted with stroke who developed lower extremity deep vein thrombosis from January 2019 to April 2024 were selected for retrospective study. Patient demographic information, medical history and comorbidities, clinical signs, laboratory indices, hospitalization, and medication were included in the analysis. LASSO regression was utilized for feature dimensionality reduction screening, models were constructed using five machine learning algorithms, and internal validation was completed. Oversampling was performed using the SMOTE algorithm as a way to address the problem of unbalanced sample proportions. Hierarchical k-fold, class weights, random search techniques, and self-stepping policy tuning were used to prevent overfitting and model optimization. Feature attributes were expressed numerically using SHAP.

A total of 337 patients were enrolled in this study, of which, 24 patients developed pulmonary embolism. A total of 11 predictor variables were screened by LASSO regression to construct the model. Among the five machine learning models, the Random Forest Classifier (RFC) model exhibited the best performance, with its area under the curve (AUC) = 0.77, accuracy = 0.721, sensitivity = 0.918, precision = 0.750, and F1 score = 0.826, PR-AUC = 0.895, Brier score = 0.172.all of which were higher than those of the other models. The rankings of the SHAP features, from highest to lowest, were oxygen partial pressure, history of hypertension, D-dimer, serum creatinine, severe lung disease, time in bed ≥72 h, stroke type, heart failure, use of acid-producing drugs, chest pain, and dyspnea.

In this study, five machine learning models were established to assess the likelihood of pulmonary embolism in stroke patients with lower extremity deep vein thrombosis, among which the RFC model performed the best. We can promptly recognize and assess patients at risk of PE based on their SHAP, take early preventive and therapeutic measures, and improve the prognosis of patients.

## Linked entities

- **Diseases:** pulmonary embolism (MONDO:0005279), stroke (MONDO:0005098), heart failure (MONDO:0005252)

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** chest pain (MESH:D002637), stroke (MESH:D020521), PE (MESH:D011655), hypertension (MESH:D006973), heart failure (MESH:D006333), dyspnea (MESH:D004417), lung disease (MESH:D008171), DVT (MESH:D020246), thrombus (MESH:D013927)
- **Chemicals:** creatinine (MESH:D003404), oxygen (MESH:D010100)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12979134/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12979134/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12979134/full.md

---
Source: https://tomesphere.com/paper/PMC12979134