Understanding CNN Fragility When Learning With Imbalanced Data
Damien Dablain, Kristen N. Jacobson, Colin Bellinger, Mark Roberts and, Nitesh Chawla

TL;DR
This paper investigates how CNNs process latent features in imbalanced data, revealing that their ability to generalize to minority classes depends on class feature diversity and magnitude, which has implications for re-sampling strategies.
Contribution
It introduces a feature-based analysis of CNNs on imbalanced data, highlighting the importance of latent class diversity and feature magnitude for generalization.
Findings
CNN learns limited class top-K features that vary with data balance.
Generalization depends on matching feature magnitudes between training and test.
Latent class diversity is crucial, beyond just class sample size.
Abstract
Convolutional neural networks (CNNs) have achieved impressive results on imbalanced image data, but they still have difficulty generalizing to minority classes and their decisions are difficult to interpret. These problems are related because the method by which CNNs generalize to minority classes, which requires improvement, is wrapped in a blackbox. To demystify CNN decisions on imbalanced data, we focus on their latent features. Although CNNs embed the pattern knowledge learned from a training set in model parameters, the effect of this knowledge is contained in feature and classification embeddings (FE and CE). These embeddings can be extracted from a trained model and their global, class properties (e.g., frequency, magnitude and identity) can be analyzed. We find that important information regarding the ability of a neural network to generalize to minority classes resides in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Digital Imaging for Blood Diseases
MethodsTest
