Efficient Augmentation for Imbalanced Deep Learning
Damien Dablain, Colin Bellinger, Bartosz Krawczyk, Nitesh Chawla

TL;DR
This paper introduces an efficient three-phase CNN training framework with a novel data augmentation method, EOS, to improve classification accuracy on imbalanced datasets by reducing the generalization gap for minority classes.
Contribution
It proposes a new training framework and EOS augmentation technique that enhance minority class recognition while being computationally efficient.
Findings
Improved accuracy over existing resampling methods.
EOS reduces the generalization gap for minority classes.
Framework is more efficient than SMOTE and GAN-based oversampling.
Abstract
Deep learning models tend to memorize training data, which hurts their ability to generalize to under-represented classes. We empirically study a convolutional neural network's internal representation of imbalanced image data and measure the generalization gap between a model's feature embeddings in the training and test sets, showing that the gap is wider for minority classes. This insight enables us to design an efficient three-phase CNN training framework for imbalanced data. The framework involves training the network end-to-end on imbalanced data to learn accurate feature embeddings, performing data augmentation in the learned embedded space to balance the train distribution, and fine-tuning the classifier head on the embedded balanced training data. We propose Expansive Over-Sampling (EOS) as a data augmentation technique to utilize in the training framework. EOS forms synthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Vehicle License Plate Recognition · Electricity Theft Detection Techniques
MethodsTest · Synthetic Minority Over-sampling Technique.
