# An interpretability heart disease prediction model based on stacking ensemble with SHAP

**Authors:** Yanjie Chen, Liqiang Chong, Zhenghao Bao, Shaoqiang Wang, Yuchen Wang, Yanan Feng

PMC · DOI: 10.3389/fmolb.2025.1763157 · Frontiers in Molecular Biosciences · 2026-02-20

## TL;DR

This study creates an interpretable heart disease prediction model using stacking ensemble learning and SHAP to identify key risk factors like age and sleep duration.

## Contribution

A novel two-layer stacking ensemble model with SHAP-based interpretability for heart disease prediction is proposed.

## Key findings

- The stacking model achieved 86.69% accuracy and balanced precision and recall better than single learners.
- Age, sleep duration, self-rated health, and BMI were identified as critical cardiovascular risk factors.
- Maintaining 7-8 hours of sleep significantly reduces heart disease risk according to local interpretive analysis.

## Abstract

In the big data era, healthcare data has grown exponentially, presenting opportunities to explore the pathogenesis of heart disease. Clarifying the correlations between health indicators and heart disease is crucial for early prevention. This study employs ensemble learning to identify the key influencing factors, assisting clinicians in understanding the pathogenesis and enhancing prediction strategies.

A two-layer stacking ensemble model is proposed, integrating Naive Bayes, Decision Trees, CatBoost and Gradient Boosting Trees to enhance prediction accuracy. To address ensemble models’ complexity and poor interpretability, the SHAP technique is introduced to visualize the decision-making logic of the ensemble model.

Experimental results show that the stacking model achieved 86.69% accuracy, 87.14% weighted precision, 86.69% weighted recall, and 86.91% weighted F1-score. It balances precision and recall, unlike single learners that prioritize one over the other. Global interpretive analysis demonstrates that age, sleep duration, self-rated health status and BMI are critical factors in assessing cardiovascular risk. Local interpretive analysis is conducted to evaluate the contribution of each feature to the prediction results of individual samples.

The stacking model’s superior performance demonstrates that ensemble learning can overcome the limitations of single learners. Additionally, key predictive factors are identified: maintaining an average sleep duration of 7-8 hours significantly reduces heart disease risk, while advanced age and poor health status increase susceptibility. This study provides a reliable predictive tool for personalized heart disease prevention and treatment.

## Linked entities

- **Diseases:** heart disease (MONDO:0005267)

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** Stroke (MESH:D020521), Skin Cancer (MESH:D012878), obese (MESH:D009765), Asthma (MESH:D001249), HD (MESH:D006816), Diabetic (MESH:D003920), walking (MESH:D013009), Heart disease (MESH:D006331), Kidney Disease (MESH:D007674), Sleep deprivation (MESH:D012892), GB (MESH:D000141), DT (MESH:D020195), acute myocardial infarction (MESH:D009203), cardiovascular disease (MESH:D002318), death (MESH:D003643)
- **Chemicals:** Alcohol (MESH:D000438)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12963821/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12963821/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12963821/full.md

---
Source: https://tomesphere.com/paper/PMC12963821