# Hybrid Synthetic Minority Over-sampling Technique (HSMOTE) and Ensemble Deep Dynamic Classifier Model (EDDCM) for big data analytics

**Authors:** Priyadharsini M, Bhawana Tyagi, Naga Priyadarsini R, Mohankumar B

PMC · DOI: 10.1038/s41598-025-23062-3 · 2025-11-11

## TL;DR

This paper introduces a hybrid framework combining HSMOTE and EDDCM to improve big data classification by addressing class imbalance and high dimensionality.

## Contribution

The novel HSMOTE and EDDCM framework integrates meta-heuristic optimization and deep learning for enhanced classification in imbalanced datasets.

## Key findings

- HSMOTE improves minority class representation by interpolating closely located instances.
- EDDCM combines DWCNN, DWBi-LSTM, and WAE with dynamic ensemble strategies for reliable predictions.
- The framework outperforms conventional models in precision, recall, F-measure, and accuracy on imbalanced datasets.

## Abstract

Big Data Classification (BDC) has become increasingly important across domains such as healthcare, e-commerce, and banking. However, challenges such as high dimensionality and class imbalance often degrade the performance of conventional machine learning (ML) models. This study proposes a hybrid framework that integrates meta-heuristic optimization with class imbalance handling to enhance BDC effectiveness. To address the class imbalance problem in both binary and multi-class datasets, a Hybrid Synthetic Minority Over-sampling Technique (HSMOTE) is introduced. HSMOTE generates synthetic minority samples by interpolating between closely located minority instances, improving the representation of rare classes. For robust feature selection, the Optimization Ensemble Feature Selection Model (OEFSM) is developed by combining the outputs of three algorithms: Fuzzy Weight Dragonfly Algorithm (FWDFA), Adaptive Elephant Herding Optimization (AEHO), and Fuzzy Weight Grey Wolf Optimization (FWGWO). These algorithms contribute diverse search strategies to improve feature relevance and reduce redundancy. To handle classification, the Ensemble Deep Dynamic Classifier Model (EDDCM) is proposed. EDDCM incorporates three deep learning (DL) architectures Density Weighted Convolutional Neural Network (DWCNN), Density Weighted Bi-Directional Long Short-Term Memory (DWBi-LSTM), and Weighted Autoencoder (WAE). Their outputs are aggregated using a dynamic ensemble strategy that considers both accuracy and diversity to improve final prediction reliability. All models are implemented in MATLAB (2014a), and performance is evaluated using precision, recall, F-measure, and accuracy. The proposed framework demonstrates improved classification results across various datasets, particularly under conditions of imbalance and high dimensionality.

## Full-text entities

- **Diseases:** malignancy (MESH:D009369), HS (MESH:D015456), AI (MESH:C538142), MLDS (MESH:C535504), CMIM (MESH:D020763), MTC (MESH:C536911), HSMOTE (MESH:D006963), , round blue cell tumors (MESH:D058405), AEHO (MESH:D016715), diffuse large cell lymphoma (MESH:D016403), brain tissue lesions (MESH:D001927), breast (MESH:D061325), UCI (MESH:D004670), SMOTE (MESH:D004832), ALGORITHM (MESH:D007859), multiple sclerosis (MESH:D009103), FS (MESH:D019846), Breast Cancer (MESH:D001943), medical diseases (MESH:D000069279), DF (MESH:D010300), EDDCM (MESH:D004195), MLP (MESH:D015161), PROMETHEE (MESH:D000092124), CKD (MESH:D012080), BD (MESH:D001528), prostate cancer (MESH:D011471), oral cancer (MESH:D009062), HD (MESH:D008228), diabetes (MESH:D003920), WAE (MESH:D015431), Hepatitis (MESH:D056486), leukaemia (MESH:D015458), CVD (MESH:D002318)
- **Chemicals:** DWBi (-)
- **Species:** Cuculus canorus (common cuckoo, species) [taxon 55661], Homo sapiens (human, species) [taxon 9606], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932]

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12614794/full.md

---
Source: https://tomesphere.com/paper/PMC12614794