# Machine Learning–Based Risk Prediction for Coronary Heart Disease Complicated by Hyperhomocysteinemia: Retrospective Study

**Authors:** Ming-Yuan Du, Meng-Ke Lyu, Hai-long Liu, Yi-zhuo Li, Hai-feng Yan, Xiao-hui Li

PMC · DOI: 10.2196/80809 · JMIR Medical Informatics · 2026-03-19

## TL;DR

This study uses machine learning to predict heart disease risk in patients with high homocysteine levels, identifying key factors like age and blood clotting time.

## Contribution

A novel machine learning model (LightGBM) with SHAP interpretation is developed for CHD risk prediction in hyperhomocysteinemia patients.

## Key findings

- The LightGBM model achieved an AUC of 0.807 and good calibration for CHD risk prediction.
- Age and activated partial thromboplastin time were the most influential predictors identified via SHAP analysis.
- The model showed low variable collinearity, ensuring stability and clinical applicability.

## Abstract

Hyperhomocysteinemia (HHcy) is recognized as an independent risk factor for coronary heart disease (CHD), yet accurately predicting CHD risk in patients with HHcy remains a challenge. This study aimed to develop and validate multiple machine learning models for predicting CHD risk in patients with HHcy and elucidate key predictors using Shapley Additive Explanation (SHAP) algorithms.

This study aims to develop and validate machine learning models for predicting the risk of coronary heart disease in individuals with normal homocysteine levels, aiming to improve early risk stratification and clinical decision-making.

This single-center retrospective study collected data from patients who were diagnosed with HHcy through electronic medical records, which were randomly divided into training (n=364, 70%), validation (n=78, 15%), and test (n=78, 15%) sets. Seven machine learning models were constructed, including logistic regression, k-nearest neighbor, decision tree, random forest, extreme gradient boost, light gradient boosting machine (LightGBM), and stacking. Six core variables (age, weight, hypertension, continuous drinking history, activated partial thromboplastin time, and carotid plaque) were utilized as inputs, with performance evaluation metrics encompassing area under the receiver operating characteristic curve, accuracy, F1-score, calibration curve, Brier score, and decision curve analysis. Additionally, SHAP interpretation was conducted on the optimal LightGBM model.

The LightGBM model exhibited superior performance in the test set (area under the receiver operating characteristic curve=0.807, F1-score=0.606), demonstrated good calibration (Brier score=0.2415), and yielded high clinical net benefit. SHAP analysis revealed age and activated partial thromboplastin time as the most influential predictors, followed by hypertension, weight, carotid plaque, and continuous drinking history. The correlation heat map illustrated low collinearity among variables, ensuring model stability.

The LightGBM model demonstrated high accuracy and interpretability in forecasting CHD risk among patients with HHcy. The integration of machine learning and interpretable artificial intelligence methods holds promise for delivering personalized early risk assessment and intervention strategies in clinical settings.

## Linked entities

- **Diseases:** coronary heart disease (MONDO:0005010), hyperhomocysteinemia (MONDO:0004743)

## Full-text entities

- **Genes:** FGB (fibrinogen beta chain) [NCBI Gene 2244] {aka HEL-S-78p}, PGR (progesterone receptor) [NCBI Gene 5241] {aka NR3C3, PR}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** congenital heart disease (MESH:D006330), arrhythmia (MESH:D001145), vascular endothelial impairment (MESH:D014652), hypercoagulable (MESH:D019851), hypertension (MESH:D006973), carotid plaque (MESH:D016893), hepatorenal insufficiency (MESH:D006530), CHD (MESH:D003327), tumors (MESH:D009369), atherosclerosis (MESH:D050197), diabetes (MESH:D003920), plaques (MESH:D003773), myocarditis (MESH:D009205), cardiovascular diseases (MESH:D002318), cognitive deficits (MESH:D003072), mental illness (MESH:D001523), heart ailments (MESH:D006331), HHcy (MESH:D020138), Acute coronary syndromes (MESH:D054058), TIA (MESH:D002546)
- **Chemicals:** homocysteine (MESH:D006710), creatinine (MESH:D003404), triglycerides (MESH:D014280), alcohol (MESH:D000438), Cr (MESH:D002857), cholesterol (MESH:D002784)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13002003/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13002003/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC13002003/full.md

---
Source: https://tomesphere.com/paper/PMC13002003