An "outside the box" solution for imbalanced data classification

Hubert Jegierski; Stanis{\l}aw Saganowski

arXiv:1911.06965·cs.LG·November 19, 2019

An "outside the box" solution for imbalanced data classification

Hubert Jegierski, Stanis{\l}aw Saganowski

PDF

TL;DR

This paper introduces an innovative 'enrichment' technique that leverages external data to significantly improve classification performance on highly imbalanced datasets, outperforming existing methods especially on small datasets.

Contribution

The paper proposes a novel enrichment approach using external data to enhance classification in imbalanced datasets, with three implementation strategies and extensive validation.

Findings

01

Average improvement of 27% in classification quality

02

Best case improvement of 66%

03

Outperforms state-of-the-art methods by 21% on average

Abstract

A common problem of the real-world data sets is the class imbalance, which can significantly affect the classification abilities of classifiers. Numerous methods have been proposed to cope with this problem; however, even state-of-the-art methods offer a limited improvement (if any) for data sets with critically under-represented minority classes. For such problematic cases, an "outside the box" solution is required. Therefore, we propose a novel technique, called enrichment, which uses the information (observations) from the external data set(s). We present three approaches to implement enrichment technique: (1) selecting observations randomly, (2) iteratively choosing observations that improve the classification result, (3) adding observations that help the classifier to determine the border between classes better. We then thoroughly analyze developed solutions on ten real-world data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.