# A stacking ensemble model for predicting the occurrence of carotid atherosclerosis

**Authors:** Xiaoshuai Zhang, Chuanping Tang, Shuohuan Wang, Wei Liu, Wangxuan Yang, Di Wang, Qinghuan Wang, Fang Tang

PMC · DOI: 10.3389/fendo.2024.1390352 · Frontiers in Endocrinology · 2024-07-23

## TL;DR

This study uses a machine learning model to predict carotid atherosclerosis risk by combining multiple models and analyzing endocrine-related factors.

## Contribution

The study introduces a stacking ensemble model that improves CAS prediction accuracy and highlights endocrine-related markers.

## Key findings

- The ensemble model achieved an AUC of 0.893 in testing and 0.861 in validation sets.
- Carotid stenosis and age were the most significant predictors of CAS.
- Endocrine-related variables showed notable contributions to CAS risk prediction.

## Abstract

Carotid atherosclerosis (CAS) is a significant risk factor for cardio-cerebrovascular events. The objective of this study is to employ stacking ensemble machine learning techniques to enhance the prediction of CAS occurrence, incorporating a wide range of predictors, including endocrine-related markers.

Based on data from a routine health check-up cohort, five individual prediction models for CAS were established based on logistic regression (LR), random forest (RF), support vector machine (SVM), extreme gradient boosting (XGBoost) and gradient boosting decision tree (GBDT) methods. Then, a stacking ensemble algorithm was used to integrate the base models to improve the prediction ability and address overfitting problems. Finally, the SHAP value method was applied for an in-depth analysis of variable importance at both the overall and individual levels, with a focus on elucidating the impact of endocrine-related variables.

A total of 441 of the 1669 subjects in the cohort were finally diagnosed with CAS. Seventeen variables were selected as predictors. The ensemble model outperformed the individual models, with AUCs of 0.893 in the testing set and 0.861 in the validation set. The ensemble model has the optimal accuracy, precision, recall and F1 score in the validation set, with considerable performance in the testing set. Carotid stenosis and age emerged as the most significant predictors, alongside notable contributions from endocrine-related factors.

The ensemble model shows enhanced accuracy and generalizability in predicting CAS risk, underscoring its utility in identifying individuals at high risk. This approach integrates a comprehensive analysis of predictors, including endocrine markers, affirming the critical role of endocrine dysfunctions in CAS development. It represents a promising tool in identifying high-risk individuals for the prevention of CAS and cardio-cerebrovascular diseases.

## Full-text entities

- **Diseases:** cardio-cerebrovascular (MESH:D059347), endocrine dysfunctions (MESH:D004700), CAS (MESH:D002340), Carotid stenosis (MESH:D016893), cardio-cerebrovascular diseases (MESH:D002561)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11300245/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11300245/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC11300245/full.md

---
Source: https://tomesphere.com/paper/PMC11300245