From SMOTE to Mixup for Deep Imbalanced Classification
Wei-Chao Cheng, Tan-Ha Mai, Hsuan-Tien Lin

TL;DR
This paper enhances data augmentation techniques for deep imbalanced classification by integrating SMOTE with Mixup, introducing a margin-aware Mixup method that improves minority class generalization and achieves state-of-the-art results.
Contribution
It unifies traditional SMOTE and modern Mixup techniques, proposing a margin-aware Mixup to explicitly handle class imbalance in deep learning.
Findings
Mixup improves generalization by creating uneven class margins.
Margin-aware Mixup explicitly enhances minority class performance.
Proposed method achieves state-of-the-art results on imbalanced datasets.
Abstract
Given imbalanced data, it is hard to train a good classifier using deep learning because of the poor generalization of minority classes. Traditionally, the well-known synthetic minority oversampling technique (SMOTE) for data augmentation, a data mining approach for imbalanced learning, has been used to improve this generalization. However, it is unclear whether SMOTE also benefits deep learning. In this work, we study why the original SMOTE is insufficient for deep learning, and enhance SMOTE using soft labels. Connecting the resulting soft SMOTE with Mixup, a modern data augmentation technique, leads to a unified framework that puts traditional and modern data augmentation techniques under the same umbrella. A careful study within this framework shows that Mixup improves generalization by implicitly achieving uneven margins between majority and minority classes. We then propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Electricity Theft Detection Techniques · Text and Document Classification Technologies
MethodsMixup · Synthetic Minority Over-sampling Technique.
