# Development of a venous thromboembolism risk prediction model for patients with primary membranous nephropathy based on machine learning

**Authors:** Lian Li, Liuyun Wu, Yin Wang, Hulin Wang, Xingyue Zheng, Lizhu Han, Qinan Yin, Xingwei Wu, Yuan Bian

PMC · DOI: 10.3389/fphar.2025.1683708 · 2025-11-06

## TL;DR

This study creates a machine learning model to predict venous thromboembolism risk in patients with primary membranous nephropathy, improving anticoagulant therapy decisions.

## Contribution

A novel machine learning-based VTE risk prediction model and web tool for primary membranous nephropathy patients.

## Key findings

- The NGBoost model achieved an AUC of 0.911 as the best-performing VTE risk prediction model.
- Ten key features were identified as important predictors of VTE risk in PMN patients.
- An online predictive tool was developed for real-time individualized VTE risk assessment.

## Abstract

This study utilizes real-world data from primary membranous nephropathy (PMN) patients to preliminarily develop a venous thromboembolism (VTE) risk prediction model with machine learning. The aim is to improve the rational use of prophylactic anticoagulant therapy by predicting VTE risk in these patients.

We collected diagnostic and treatment data for PMN patients hospitalized at Sichuan Provincial People’s Hospital from 1 January 2018, to 30 September 2024. The data was divided into training and test sets at an 8:2 ratio, followed by processed using combinations of three imputation methods, three sampling methods, and three feature selection methods. After preprocessing, fourteen machine learning algorithms were employed to develop a predictive model for VTE risk in PMN patients. The SHapley Additive exPlanation (SHAP) method was used to interpret the contribution of outcome features. Finally, a VTE risk prediction tool for PMN patients was constructed using Streamlit.

A total of 643 patients with PMN were included in the study, of whom 93 developed VTE. Among the 504 models constructed, the NGBoost model, which incorporated imputation by K-Nearest Neighbor, sampling by Borderline-SMOTE, and feature selection by Frequency-based Selection, was identified as the optimal model, achieving an area under the curve (AUC) of 0.911. The optimal model included ten features: D-dimer (DD), Fibrin Degradation Products (FDP)>5 mg/L, international normalized ratio (INR) of prothrombin, Recurrent nephrotic syndrome (RNS), cholinesterase (CHE), Urinary Microalbumin to Creatinine Ratio (umALB/Ucr), statins, antithrombin III (AT III) activity, albumin, and anti-phospholipase A2 receptor antibody (aPLA2Rab). Finally, an online predictive tool based on the optimal model was developed to provide real-time individualized VTE risk predictions for PMN patients.

This study developed a personalized risk prediction model for VTE in PMN patients using machine learning techniques. Additionally, a web-based tool for this predictive model was created. The model demonstrates strong predictive performance and can assist in clinical decision-making for the prevention and treatment of VTE in PMN patients.

## Linked entities

- **Diseases:** venous thromboembolism (MONDO:0005399)

## Full-text entities

- **Genes:** SERPINC1 (serpin family C member 1) [NCBI Gene 462] {aka AT3, AT3D, ATIII, ATIII-R2, ATIII-T1, ATIII-T2}, ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, F2 (coagulation factor II, thrombin) [NCBI Gene 2147] {aka PT, RPRGL2, THPH1}, BCHE (butyrylcholinesterase) [NCBI Gene 590] {aka BCHED, CHE1, CHE2, E1}
- **Diseases:** VTE (MESH:D054556), RNS (MESH:D009404), PMN (MESH:D015433)
- **Chemicals:** Microalbumin (-), Creatinine (MESH:D003404), D (MESH:D003903)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12631078/full.md

---
Source: https://tomesphere.com/paper/PMC12631078