Influence of Resampling on Accuracy of Imbalanced Classification

Evgeny Burnaev; Pavel Erofeev; Artem Papanov

arXiv:1707.03905·stat.ML·July 14, 2017

Influence of Resampling on Accuracy of Imbalanced Classification

Evgeny Burnaev, Pavel Erofeev, Artem Papanov

PDF

TL;DR

This paper investigates how different resampling techniques affect the accuracy of classifiers in imbalanced binary classification tasks, highlighting key challenges and best practices.

Contribution

It provides an experimental comparison of resampling methods and discusses their impact on classification accuracy in imbalanced datasets.

Findings

01

Resampling significantly influences classification accuracy.

02

Different resampling methods have varying effectiveness.

03

Key challenges include selecting appropriate resampling techniques.

Abstract

In many real-world binary classification tasks (e.g. detection of certain objects from images), an available dataset is imbalanced, i.e., it has much less representatives of a one class (a minor class), than of another. Generally, accurate prediction of the minor class is crucial but it's hard to achieve since there is not much information about the minor class. One approach to deal with this problem is to preliminarily resample the dataset, i.e., add new elements to the dataset or remove existing ones. Resampling can be done in various ways which raises the problem of choosing the most appropriate one. In this paper we experimentally investigate impact of resampling on classification accuracy, compare resampling methods and highlight key points and difficulties of resampling.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.