# Development of an explainable machine learning model for 3-year cardiovascular risk prediction in new-onset type 2 diabetes using the TyG index and ultrasound features

**Authors:** Zhen-zhen Jiang, Yan-feng Jiang, Yue Wang, Ying Zhou, Rong-li Peng, Cai-ye Ma, Xia-tian Liu

PMC · DOI: 10.1186/s12911-025-03247-6 · BMC Medical Informatics and Decision Making · 2025-11-04

## TL;DR

This study creates a machine learning model to predict cardiovascular risk in new-onset type 2 diabetes patients using clinical and ultrasound data, offering better accuracy than traditional methods.

## Contribution

A novel machine learning model combining clinical data and ultrasound features, with the TyG index, for improved cardiovascular risk prediction in new-onset T2D patients.

## Key findings

- The LightGBM model achieved an AUC of 0.845 in training and 0.772 in validation, outperforming the Framingham Risk Score.
- SHAP analysis provided individualized interpretability and clinical insights into the model's predictions.
- A web-based tool was developed for real-time clinical application of the model.

## Abstract

New-onset type 2 diabetes (T2D) is associated with increased cardiovascular risk and requires tailored prevention strategies. Traditional risk factors and assessment tools may not accurately predict cardiovascular disease (CVD) in this population. In our study, we compared different machine learning (ML) methods to predict the 3-year risk of developing CVD in new-onset T2D patients and developed models combining clinical data and ultrasound features for better risk evaluation.

A group of 3,358 hospitalized T2D patients was screened. ML models were developed and evaluated. Feature selection was conducted via SHapley Additive exPlanations (SHAPs) and recursive feature elimination to improve both the model’s performance and its interpretability. The optimal model was subsequently compared with the Framingham Risk Score (FRS). Ultimately, the model was employed for risk stratification.

Of the ML models developed, LightGBM, which incorporates six features—namely, hypertension, age, the triglyceride-glucose (TyG) index, plaque burden, maximum plaque thickness, and intima-media thickness, achieved robust performance (AUC 0.845 in the training cohort and 0.772 in the validation cohort). The model outperformed the traditional FRS (AUC 0.672 in the training cohort and 0.608 in the validation cohort, P < 0.05). SHAP analysis enabled individualized interpretability and clinical insights. A web-based tool was deployed to facilitate clinical application.

The predictive model developed in this study by integrating clinical and imaging data, with a focus on the TyG index and ultrasound features, demonstrated enhanced predictive capability for CVD incidence in individuals with new–onset T2D. It also allows easy risk classification and is available as a web tool for real-time use, helping improve early detection and personalized care.

The online version contains supplementary material available at 10.1186/s12911-025-03247-6.

## Linked entities

- **Diseases:** type 2 diabetes (MONDO:0005148), cardiovascular disease (MONDO:0004995)

## Full-text entities

- **Diseases:** type 2 diabetes (MESH:D003924)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12584365/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12584365/full.md

## References

2 references — full list in the complete paper: https://tomesphere.com/paper/PMC12584365/full.md

---
Source: https://tomesphere.com/paper/PMC12584365