TL;DR
This paper systematically investigates how class imbalance affects CNN performance across multiple datasets and compares common mitigation methods, highlighting oversampling as the most effective approach without overfitting.
Contribution
It provides a comprehensive analysis of class imbalance effects on CNNs and evaluates mitigation strategies, emphasizing the effectiveness of oversampling in deep learning contexts.
Findings
Class imbalance negatively impacts CNN classification performance.
Oversampling consistently outperforms other methods in mitigating imbalance.
Oversampling eliminates imbalance without causing overfitting in CNNs.
Abstract
In this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue. Class imbalance is a common problem that has been comprehensively studied in classical machine learning, yet very limited systematic research is available in the context of deep learning. In our study, we use three benchmark datasets of increasing complexity, MNIST, CIFAR-10 and ImageNet, to investigate the effects of imbalance on classification and perform an extensive comparison of several methods to address the issue: oversampling, undersampling, two-phase training, and thresholding that compensates for prior class probabilities. Our main evaluation metric is area under the receiver operating characteristic curve (ROC AUC) adjusted to multi-class tasks since overall accuracy metric is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
