A Study imbalance handling by various data sampling methods in binary classification
Mohamed Hamama

TL;DR
This paper explores various data sampling techniques to address class imbalance in binary classification, demonstrating their impact on model performance through experiments on a Kaggle dataset.
Contribution
It provides a comparative analysis of over-sampling and under-sampling methods for handling data imbalance in binary classification tasks.
Findings
Sampling methods improve class balance and model accuracy
Over-sampling and under-sampling have different effects on performance
The study highlights gaps for future research in imbalance handling
Abstract
The purpose of this research report is to present the our learning curve and the exposure to the Machine Learning life cycle, with the use of a Kaggle binary classification data set and taking to explore various techniques from pre-processing to the final optimization and model evaluation, also we highlight on the data imbalance issue and we discuss the different methods of handling that imbalance on the data level by over-sampling and under sampling not only to reach a balanced class representation but to improve the overall performance. This work also opens some gaps for future work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications
