# Web-based cardiovascular disease risk prediction using machine learning

**Authors:** Suraiya Akhter, John H. Miller

PMC · DOI: 10.3389/frai.2026.1690664 · Frontiers in Artificial Intelligence · 2026-02-13

## TL;DR

This paper presents a web-based tool using machine learning to predict cardiovascular disease risk with high accuracy and interpretability.

## Contribution

The novel contribution is the development of a web application using hypergraph-based feature selection and SVM for accurate and interpretable CVD risk prediction.

## Key findings

- Hypergraph-Based Feature Evaluation (HFE) combined with SVM achieved the highest accuracy (82.84%) and AUC (0.9027) for CVD risk prediction.
- Key predictors included age, cholesterol, blood pressure history, and socioeconomic factors like income-to-poverty ratio.
- The web application provides predictive results and SHAP plots for model interpretability.

## Abstract

Cardiovascular disease (CVD) remains the foremost contributor to global illness and death, underscoring the critical need for effective tools that can predict risk at early stages to support preventive care and timely clinical decisions. With the growing complexity of healthcare data, machine learning has shown considerable promise in extracting insights that enhance medical decision-making. Nonetheless, the effectiveness and clarity of machine learning models largely rely on the relevance and quality of input features. In this work, we explored and compared four feature-selection strategies—Pearson correlation + Chi-squared test, Alternating Decision Tree (ADT)-based scoring, Cross-Validated Feature Evaluation (CVFE), and Hypergraph-Based Feature Evaluation (HFE)—to identify the most predictive factors for CVD risk. Our analysis utilized data from the National Health and Nutrition Examination Survey (NHANES), administered by the National Center for Health Statistics under the Centers for Disease Control and Prevention (CDC), encompassing demographic, clinical, laboratory, and survey data collected across the U.S. from August 2021 through August 2023. Distinct sets of features obtained through these selection techniques were used to develop random forest (RF), support vector machine (SVM), and eXtreme Gradient Boosting (XGBoost) models, which were then assessed for predictive effectiveness. To improve clarity and understanding of model decision-making, SHapley Additive exPlanations (SHAP) was used to interpret feature contributions in the top-performing model. Among the evaluated methods, the HFE approach combined with SVM achieved the highest overall accuracy (82.84%) and AUC (0.9027), outperforming both classical and alternative strategies. The most influential predictors included age, total cholesterol, history of high blood pressure, use of cholesterol-lowering medication, recent prescription medication use, lifetime smoking history, family income-to-poverty ratio, gender, educational attainment, and red cell distribution width. The web application, accessible at https://shiny.tricities.wsu.edu/cvdr-prediction/, presents predictive results, probability scores, and SHAP plots generated from the model trained using the feature set selected by the hypergraph-based approach. This study highlights the importance of strategic feature selection in refining predictive accuracy and interpretability, offering a practical data-driven approach that could aid clinicians in evaluating cardiovascular risk and tailoring preventive care.

## Linked entities

- **Diseases:** Cardiovascular disease (MONDO:0004995), high blood pressure (MONDO:0005044)

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** inflammation (MESH:D007249), heart and circulatory system disorders (MESH:D012769), coronary heart disease (MESH:D003327), diabetes (MESH:D003920), obesity (MESH:D009765), stroke (MESH:D020521), metabolic dysregulation (MESH:D021081), death (MESH:D003643), hypertension (MESH:D006973), CVD (MESH:D002318), heart attack (MESH:D009203), heart failure (MESH:D006333), rheumatic heart conditions (MESH:D012214), peripheral artery disease (MESH:D058729), cardiovascular and coronary artery disease (MESH:D003324), congenital cardiovascular defects (MESH:D018376)
- **Chemicals:** cholesterol (MESH:D002784), BPQ020 (-), lipid (MESH:D008055)
- **Species:** Nicotiana tabacum (American tobacco, species) [taxon 4097], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12946134/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12946134/full.md

## References

61 references — full list in the complete paper: https://tomesphere.com/paper/PMC12946134/full.md

---
Source: https://tomesphere.com/paper/PMC12946134