Development of Venous Thromboembolism Risk Prediction Models Based on Whole Blood Gene Expression Profiling Using 20 Machine Learning Algorithms: Comprehensive Analysis Study
Yedong Huang, Xiaoyun Chen, Guannan Bai, Yajun Zhao, Dapeng Kuang, Lin Zhang, Wei Lu

TL;DR
This study developed VTE risk prediction models using blood gene expression data and 20 machine learning algorithms, with nine models showing strong diagnostic performance in validation.
Contribution
The novel use of whole blood gene expression profiling with 20 ML algorithms to build and validate VTE prediction models.
Findings
Nine machine learning models achieved an area under the curve greater than 0.75 in external validation.
Most models maintained high specificity in external validation cohorts.
Combining these models with D-dimer could improve VTE diagnosis.
Abstract
There is a lack of venous thromboembolism (VTE) risk prediction models based on gene expression information. This study aimed to construct a VTE prediction model based on whole blood gene expression profiling, by performing a comprehensive analysis of 20 machine learning (ML) algorithms. Two transcriptome datasets containing patients with VTE and healthy controls were obtained by searching the Gene Expression Omnibus database and used as the training and validation sets, respectively. Feature selection for model construction was performed on the training set using the least absolute shrinkage and selection operator and random forest, followed by the selection of the intersection of the chosen features. Subsequently, recursive feature elimination was applied to further refine the selected features. The selected features underwent model construction using 20 ML algorithms. The…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Venous Thromboembolism Diagnosis and Management · Blood Coagulation and Thrombosis Mechanisms
