A Study imbalance handling by various data sampling methods in binary   classification

Mohamed Hamama

arXiv:2105.10959·cs.LG·May 25, 2021·1 cites

A Study imbalance handling by various data sampling methods in binary classification

Mohamed Hamama

PDF

Open Access

TL;DR

This paper explores various data sampling techniques to address class imbalance in binary classification, demonstrating their impact on model performance through experiments on a Kaggle dataset.

Contribution

It provides a comparative analysis of over-sampling and under-sampling methods for handling data imbalance in binary classification tasks.

Findings

01

Sampling methods improve class balance and model accuracy

02

Over-sampling and under-sampling have different effects on performance

03

The study highlights gaps for future research in imbalance handling

Abstract

The purpose of this research report is to present the our learning curve and the exposure to the Machine Learning life cycle, with the use of a Kaggle binary classification data set and taking to explore various techniques from pre-processing to the final optimization and model evaluation, also we highlight on the data imbalance issue and we discuss the different methods of handling that imbalance on the data level by over-sampling and under sampling not only to reach a balanced class representation but to improve the overall performance. This work also opens some gaps for future work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications