TL;DR
This paper introduces SMOTE, a technique for improving classifier performance on imbalanced datasets by creating synthetic minority class examples, which enhances sensitivity and ROC performance compared to traditional methods.
Contribution
The paper presents SMOTE, a novel over-sampling method that generates synthetic minority class examples to improve classifier performance on imbalanced data.
Findings
SMOTE improves ROC AUC scores over traditional under-sampling.
Combining SMOTE with under-sampling outperforms other class imbalance techniques.
Experiments show enhanced classifier sensitivity with SMOTE.
Abstract
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSynthetic Minority Over-sampling Technique.
