# Clustering-cum-regression based model and performance analysis for early prediction of heart disease

**Authors:** Manoj Tolani, Yazeed AlZahrani, Gaurav Suman, Pankaj Kumar, Arun Balodi, Ambar Bajpai

PMC · DOI: 10.1038/s41598-026-40626-z · Scientific Reports · 2026-02-18

## TL;DR

This paper introduces a new hybrid model combining clustering and regression to improve early prediction of heart disease with higher accuracy than existing methods.

## Contribution

A novel hybrid model integrating K-Means clustering and Random Forest regression for improved heart disease prediction accuracy and performance.

## Key findings

- The hybrid model achieved 91% accuracy, outperforming traditional methods.
- Improved recall, specificity, F1 score, and ROC-AUC were observed in the proposed model.
- The model maintains robustness without increasing complexity.

## Abstract

In real-time health monitoring systems, Wireless Body Area Networks (WBAN) are widely recognized for collecting various disease parameters using sensors. The collected data can be used for the early prediction of diseases. To address the growing need for accurate and efficient heart disease prediction, we introduce a novel hybrid approach that combines K-Means clustering with advanced regression techniques to analyze various factors in heart health monitoring. This integrated method utilizes the strengths of unsupervised and supervised learning to enhance predictive accuracy across both training and testing datasets. Our analysis focuses on 12 critical feature parameters, systematically clustered using K-Means to uncover inherent patterns and relationships. These parameters are then rigorously evaluated through multiple regression models to determine their predictive significance. By employing K-Means to assess parameter relevance within defined ranges, the proposed framework ensures robust feature selection and improved model interpretability. To validate its effectiveness, we benchmark our approach against widely used machine learning models, including Decision Tree Regression, K-Nearest Neighbor, Support Vector Machine (SVM), Kernel SVM, and others. The results demonstrate that our method not only outperforms traditional techniques but also offers a scalable and reliable solution for real-world healthcare applications. The prediction accuracy and false-prediction performance parameters were analyzed to compare the proposed method with existing heart disease prediction models. Earlier approaches reported accuracies up to 85%, with limited improvement in recall, specificity, and F1 score. In contrast, the newly proposed hybrid model–integrating Random Forest regression with K-Means clustering–achieved a significantly higher accuracy of 91%, along with improved recall (0.8864), specificity (0.9583), F1 score (0.8977), and ROC–AUC (0.9155). These quantitative performance gains, obtained without increasing model complexity, clearly demonstrate the superiority and robustness of the proposed approach over traditional prediction methods.

## Linked entities

- **Diseases:** heart disease (MONDO:0005267)

## Full-text entities

- **Diseases:** heart disease (MESH:D006331)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13004894/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13004894/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/PMC13004894/full.md

---
Source: https://tomesphere.com/paper/PMC13004894