# An explainable hybrid framework for early detection of cardiovascular diseases using Categorical Boosting and Bees algorithm

**Authors:** Jayanta Sen, Sweta Bhattacharya

PMC · DOI: 10.1038/s41598-025-28514-4 · Scientific Reports · 2025-12-13

## TL;DR

This paper introduces a transparent machine learning framework combining CatBoost and the Bees algorithm to accurately detect cardiovascular disease and provide interpretable results for better healthcare decisions.

## Contribution

A novel hybrid ML framework with high accuracy and interpretability for early CVD detection using CatBoost and the Bees algorithm.

## Key findings

- The hybrid model achieved 98.04% accuracy and outperformed existing algorithms in most metrics.
- XAI techniques like LIME and SHAP were used to explain the model's predictions, highlighting key factors for CVD.
- The model's execution time was 26.6580 seconds, making it efficient for practical use.

## Abstract

Cardiovascular disease (CVD) remains one of the leading causes of death worldwide, claiming millions of lives each year. The early detection of CVD enables healthcare professionals to make informed decisions about the patient’s health. Machine learning (ML)- based frameworks have been extremely popular in predicting diseases. However, results generated from traditional ML models are “black-box,” lacking transparency and interpretability. The objective of the present study is to develop an ML framework that detects CVD with promising accuracy and, further, provide interpretability to the generated outcomes to ensure targeted therapies. The Framingham, Massachusetts CVD dataset, which is publicly available from the Kaggle Repository, is used in this study. As part of the data pre-processing, the Random Oversampling (RO) technique is applied to overcome the data imbalance problem, followed by Pearson Correlation analysis to understand the correlation between attributes. Then, the Min–Max scaling technique is used for data normalization. The pre-processed data is fed into a hybrid ML framework incorporating the Categorical Boosting (CatBoost) and BEEs algorithms to achieve optimized CVD prediction results. The proposed Hybrid model yielded 98.04% accuracy, a Precision of 97.09%, a Recall of 98.96%, an F1-score of 98.02%, and a Specificity of 97.16%, with a total execution time of 26.6580 s. The proposed model outperformed contemporary state-of-the-art algorithms, considering most evaluation metrics. Additionally, Explainable Artificial Intelligence (XAI) techniques, such as LIME and SHAP, are implemented to identify the contribution of the most significant attributes towards the occurrence of CVD, offering valuable insights into the detection of the disease and enabling healthcare providers to make accurate and timely treatment decisions.

## Linked entities

- **Diseases:** cardiovascular disease (MONDO:0004995)

## Full-text entities

- **Diseases:** CVD (MESH:D002318), death (MESH:D003643)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12756275/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12756275/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/PMC12756275/full.md

---
Source: https://tomesphere.com/paper/PMC12756275