Interpretable Heart Disease Prediction via a Weighted Ensemble Model: A Large-Scale Study with SHAP and Surrogate Decision Trees
Md Abrar Hasnat, Md Jobayer, Md. Mehedi Hasan Shawon, Md. Golam Rabiul Alam

TL;DR
This study develops an interpretable, high-performing ensemble model combining tree-based methods and CNN for heart disease prediction, utilizing explainability tools like SHAP and surrogate trees to enhance clinical transparency.
Contribution
Introduces a weighted ensemble model integrating multiple architectures with explainability techniques for improved, interpretable heart disease risk prediction on large-scale data.
Findings
Achieved a Test AUC of 0.8371, significantly better than individual models.
High recall of 80% suitable for screening applications.
Model provides transparency through SHAP and surrogate decision trees.
Abstract
Cardiovascular disease (CVD) remains a critical global health concern, demanding reliable and interpretable predictive models for early risk assessment. This study presents a large-scale analysis using the Heart Disease Health Indicators Dataset, developing a strategically weighted ensemble model that combines tree-based methods (LightGBM, XGBoost) with a Convolutional Neural Network (CNN) to predict CVD risk. The model was trained on a preprocessed dataset of 229,781 patients where the inherent class imbalance was managed through strategic weighting and feature engineering enhanced the original 22 features to 25. The final ensemble achieves a statistically significant improvement over the best individual model, with a Test AUC of 0.8371 (p=0.003) and is particularly suited for screening with a high recall of 80.0%. To provide transparency and clinical interpretability, surrogate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)
