# Padding interpolation, median imputation, RobustScalar, and particle swarm optimization with heterogeneous classifiers: a robust combination for effective heart disease diagnosis

**Authors:** Sanjay Dhanka, Ankur Kumar, Surita Maini, Nitin Kumar, Jeewan Singh, Mudassir Khan, Mohamed Abbas, Amel Ksibi

PMC · DOI: 10.3389/fmed.2025.1721740 · Frontiers in Medicine · 2026-01-12

## TL;DR

This paper introduces a robust machine learning framework for heart disease diagnosis using advanced preprocessing and an improved optimization algorithm.

## Contribution

A novel framework combining improved particle swarm optimization with preprocessing techniques for heart disease diagnosis.

## Key findings

- The proposed IPSO-optimized XGBoost model achieved 91.3% accuracy in heart disease diagnosis.
- The model showed high generalizability on independent datasets like Cleveland and Statlog.
- Statistical tests confirmed significant improvements over baseline models (p < 0.05).

## Abstract

Heart disease is a leading cause of death worldwide, necessitating accurate early diagnosis. Although machine learning (ML) shows potential for this task, many current models are hindered by data inconsistencies, poor feature selection, and limited robustness.

This study proposes a novel, robust diagnostic framework. It employs advanced data preprocessing using Padding Interpolation for missing values, Median Imputation for outliers, and RobustScalar for scaling to ensure data integrity. A key innovation is an Improved Particle Swarm Optimization (IPSO) algorithm, enhanced with dynamic inertia weight and a mutation operator to avoid premature convergence. This IPSO performs dual optimization: selecting optimal features and tuning the hyperparameters of five classifiers (Logistic Regression, Linear Discriminant Analysis, Gaussian Naïve Bayes, Support Vector Classifier, and XGBoost).

The framework was evaluated on a composite dataset from five public repositories. The proposed IPSO-optimized XGBoost model achieved superior performance at a 90:10 train-test split, with an accuracy of 91.3%, sensitivity of 88.37%, specificity of 93.88%, precision of 92.68%, F1-score of 90.48%, and a Diagnostic Odds Ratio of 116.53. Statistical tests (p < 0.05) confirmed these improvements over baselines were significant. The model also demonstrated consistent generalizability on independent Cleveland and Statlog datasets.

The results establish that the integrated framework of rigorous preprocessing and the hybrid IPSO optimization-classification model creates a highly effective and generalizable pipeline for automated heart disease diagnosis, addressing key limitations of existing approaches.

Hybrid Advanced Machine Learning Models.
Flowchart illustrating a process for heart disease diagnosis using machine learning. It includes datasets from five sources, data preprocessing with encoding and scaling, and implementation through training and testing datasets. Improved Particle Swarm Optimization refines models, utilizing hybrid optimized ML models for performance metrics, validation, and diagnosis. Various models are analyzed for heart disease assistance using input parameters and metadata.

Hybrid Advanced Machine Learning Models.

## Linked entities

- **Diseases:** heart disease (MONDO:0005267)

## Full-text entities

- **Diseases:** Heart disease (MESH:D006331), death (MESH:D003643)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12832926/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12832926/full.md

## References

72 references — full list in the complete paper: https://tomesphere.com/paper/PMC12832926/full.md

---
Source: https://tomesphere.com/paper/PMC12832926