# An explainable AI-driven hybrid feature selection approach for coronary artery disease diagnosis

**Authors:** Tarneem Elemam, Hosam Refaat, Mohamed Makhlouf

PMC · DOI: 10.1038/s41598-026-41712-y · 2026-03-25

## TL;DR

This paper introduces a new AI method to select important features for diagnosing heart disease, improving accuracy and performance compared to existing methods.

## Contribution

The novel SHOW algorithm combines SHAP-based ranking with optimized wrapper selection for improved CAD diagnosis.

## Key findings

- SHOW outperforms 14 state-of-the-art algorithms in accuracy and feature selection for CAD diagnosis.
- Using SHOW with XGBoost achieves over 93% accuracy on the Z-Alizadeh Sani dataset with only 14 selected features.
- SHOW demonstrates strong performance across three CAD datasets with high sensitivity, specificity, and AUC metrics.

## Abstract

Coronary artery disease (CAD), where the heart does not get enough oxygen-rich blood due to a buildup of fatty matter, is a leading cause of death worldwide. Since its symptoms may not be recognized until a cardiac attack occurs, its early diagnosis is crucial. In this paper, we introduce the SHAP Optimized Wrapper (SHOW) feature selection algorithm, which works in two steps. First, a SHapley Additive exPlanations (SHAP) method is developed using XGBoost, Random Forest (RF), and Support Vector Machine (SVM) classifiers, to rank the features based on their diagnostic significance. Second, an optimized sequential forward selection wrapper technique is employed, whereby the ranked features are evaluated to select the optimal subset. To validate the algorithm, it is used in seven classifiers to classify three public domain CAD data sets. The classifiers are XGBoost, RF, SVM, Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Multilayer Perceptron (MLP). The data sets are the Z-Alizadeh Sani, Cleveland, and Statlog. Leveraging stratified 10-fold cross-validation and delicate hyperparameter tuning, the results reveal that the SHOW algorithm significantly outperforms 14 state-of-the-art competitive algorithms in terms of accuracy and the number of selected features, while also demonstrating favorable performance in clinically relevant metrics such as sensitivity, specificity, AUC, and F1-score. For example, using the XGBoost classifier, the algorithm selects 14 features (out of 55) from the Z-Alizadeh Sani data set, achieving 93.79% accuracy, 93.98% sensitivity, 89.81% specificity, 0.97 AUC, and 93.98% F1-score; 5 features (out of 13) from the Cleveland data set, achieving 86.52% accuracy, 88.55% sensitivity, 85% specificity, 0.89 AUC, and 84.84% F1-score; and 5 features (out of 13) from the Statlog data set, achieving 87.78% accuracy, 80% sensitivity, 92.67% specificity, 0.90 AUC, and 85.18% F1-score. These figures are not matched by any of the 14 competitive algorithms.

The online version contains supplementary material available at 10.1038/s41598-026-41712-y.

## Linked entities

- **Diseases:** coronary artery disease (MONDO:0005010), CAD (MONDO:0005010)

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** Parkinson's disease (MESH:D010300), coronary infarction (MESH:C564258), heart failure (MESH:D006333), breast cancer (MESH:D001943), CVDs (MESH:D002318), CAD (MESH:D003324), ML (MESH:C537366), RHD (MESH:D012214), ischemic heart disease (MESH:D017202), cardiomyopathies (MESH:D009202), atherosclerosis (MESH:D050197), congenital heart disease (MESH:D006330), dementia (MESH:D003704), peripheral vascular disease (MESH:D016491), heart attack (MESH:D009203), cerebrovascular disease (MESH:D002561), stroke (MESH:D020521), Chest Pain (MESH:D002637), CHD (MESH:D003327), death (MESH:D003643), cardiac attack (MESH:D006331)
- **Chemicals:** fatty matter (-), oxygen (MESH:D010100)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13031509/full.md

---
Source: https://tomesphere.com/paper/PMC13031509