An "outside the box" solution for imbalanced data classification
Hubert Jegierski, Stanis{\l}aw Saganowski

TL;DR
This paper introduces an innovative 'enrichment' technique that leverages external data to significantly improve classification performance on highly imbalanced datasets, outperforming existing methods especially on small datasets.
Contribution
The paper proposes a novel enrichment approach using external data to enhance classification in imbalanced datasets, with three implementation strategies and extensive validation.
Findings
Average improvement of 27% in classification quality
Best case improvement of 66%
Outperforms state-of-the-art methods by 21% on average
Abstract
A common problem of the real-world data sets is the class imbalance, which can significantly affect the classification abilities of classifiers. Numerous methods have been proposed to cope with this problem; however, even state-of-the-art methods offer a limited improvement (if any) for data sets with critically under-represented minority classes. For such problematic cases, an "outside the box" solution is required. Therefore, we propose a novel technique, called enrichment, which uses the information (observations) from the external data set(s). We present three approaches to implement enrichment technique: (1) selecting observations randomly, (2) iteratively choosing observations that improve the classification result, (3) adding observations that help the classifier to determine the border between classes better. We then thoroughly analyze developed solutions on ten real-world data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
