# An approach for handling imbalanced datasets using borderline shifting

**Authors:** Mohammed G. Malhat, Alaa M. Elsobky, Arabi EI. Keshk, Hanaa A. Abdallah, Mahmoud Hussein

PMC · DOI: 10.1038/s41598-026-39118-x · Scientific Reports · 2026-03-04

## TL;DR

This paper introduces a new method called Borderline Shifting to improve classification in imbalanced datasets by focusing on borderline instances.

## Contribution

The novel Borderline Shifting resampling method outperforms existing techniques in handling class imbalance across multiple classifiers and metrics.

## Key findings

- Borderline Shifting achieved an average F1-score of 0.83 ± 0.06 with SVM, outperforming conventional methods.
- The method significantly improved Naïve Bayes' AUC from 0.68 to 0.84 ± 0.06.
- Random Forest showed the highest G-mean of 0.88 ± 0.04 with Borderline Shifting.

## Abstract

In supervised learning tasks, class imbalance is a persistent problem that often leads to biased classification models that prioritize the majority class over the minority. To tackle this problem, we present a new resampling method called Bor- derline Shifting, which strengthens the model’s capacity to distinguish between classes close to the decision boundary by selectively enhancing significant borderline instances. Using a variety of 30 benchmark imbalanced datasets, this study com- pares the proposed method to 7 popular resampling techniques: Random Under Sampling (RUS), Random Over Sampling (ROS), SMOTE, Borderline-SMOTE, NearMiss, SMOTE-Tomek, and SMOTEENN. Performance was assessed using three well-known classifiers: Random Forest (RF), Naïve Bayes (NB), and Support Vector Machine (SVM). Evaluation metrics in- cluded F1-score, G-mean, AUC, recall, and precision. The findings show that the Borderline In every metric and classifier, the shifting method continuously produced better results. Our approach outperformed conventional methods like SMOTE and Borderline-SMOTE, achieving an average F1-score of 0.83 ± 0.06, G-mean of 0.86 ± 0.05, and AUC of 0.89 ± 0.04 with SVM. Our method significantly improved the F1-score from 0.62 (baseline) to 0.78 ± 0.07 and the AUC from 0.68 to 0.84 ± 0.06 of Naïve Bayes, which is usually sensitive to data imbalance. The robust Random Forest also benefited greatly: our approach produced the highest overall G-mean of 0.88 ± 0.04 and a stable AUC of 0.91 ± 0.03 with little variation between datasets. These findings show that the suggested Borderline Shifting approach not only solves the imbalance issue more successfully than current approaches but also improves classification performance in a consistent manner across various learning models. For real-world imbalanced learning scenarios, this makes it a viable and broadly applicable solution.

The online version contains supplementary material available at 10.1038/s41598-026-39118-x.

## Full-text entities

- **Diseases:** bayes (MESH:D000074021)
- **Chemicals:** SMOTE (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12963434/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12963434/full.md

## References

8 references — full list in the complete paper: https://tomesphere.com/paper/PMC12963434/full.md

---
Source: https://tomesphere.com/paper/PMC12963434