Interpretable Heart Disease Prediction via a Weighted Ensemble Model: A Large-Scale Study with SHAP and Surrogate Decision Trees

Md Abrar Hasnat; Md Jobayer; Md. Mehedi Hasan Shawon; Md. Golam Rabiul Alam

arXiv:2511.01947·cs.LG·November 5, 2025

Interpretable Heart Disease Prediction via a Weighted Ensemble Model: A Large-Scale Study with SHAP and Surrogate Decision Trees

Md Abrar Hasnat, Md Jobayer, Md. Mehedi Hasan Shawon, Md. Golam Rabiul Alam

PDF

Open Access

TL;DR

This study develops an interpretable, high-performing ensemble model combining tree-based methods and CNN for heart disease prediction, utilizing explainability tools like SHAP and surrogate trees to enhance clinical transparency.

Contribution

Introduces a weighted ensemble model integrating multiple architectures with explainability techniques for improved, interpretable heart disease risk prediction on large-scale data.

Findings

01

Achieved a Test AUC of 0.8371, significantly better than individual models.

02

High recall of 80% suitable for screening applications.

03

Model provides transparency through SHAP and surrogate decision trees.

Abstract

Cardiovascular disease (CVD) remains a critical global health concern, demanding reliable and interpretable predictive models for early risk assessment. This study presents a large-scale analysis using the Heart Disease Health Indicators Dataset, developing a strategically weighted ensemble model that combines tree-based methods (LightGBM, XGBoost) with a Convolutional Neural Network (CNN) to predict CVD risk. The model was trained on a preprocessed dataset of 229,781 patients where the inherent class imbalance was managed through strategic weighting and feature engineering enhanced the original 22 features to 25. The final ensemble achieves a statistically significant improvement over the best individual model, with a Test AUC of 0.8371 (p=0.003) and is particularly suited for screening with a high recall of 80.0%. To provide transparency and clinical interpretability, surrogate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)