SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for   nominal and continuous features

Mimi Mukherjee; Matloob Khushi

arXiv:2103.07612·cs.LG·March 16, 2021

SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for nominal and continuous features

Mimi Mukherjee, Matloob Khushi

PDF

1 Repo

TL;DR

SMOTE-ENC introduces a new synthetic data generation technique that encodes nominal features as numeric values, improving over SMOTE-NC especially for datasets with many categorical features or purely nominal data.

Contribution

The paper proposes SMOTE-ENC, a novel over-sampling method that encodes nominal features numerically, addressing limitations of SMOTE-NC and applicable to both mixed and nominal-only datasets.

Findings

01

SMOTE-ENC outperforms SMOTE-NC in datasets with many nominal features.

02

The method effectively handles datasets with only nominal features.

03

SMOTE-ENC improves classification accuracy in imbalanced datasets.

Abstract

Real world datasets are heavily skewed where some classes are significantly outnumbered by the other classes. In these situations, machine learning algorithms fail to achieve substantial efficacy while predicting these under-represented instances. To solve this problem, many variations of synthetic minority over-sampling methods (SMOTE) have been proposed to balance the dataset which deals with continuous features. However, for datasets with both nominal and continuous features, SMOTE-NC is the only SMOTE-based over-sampling technique to balance the data. In this paper, we present a novel minority over-sampling method, SMOTE-ENC (SMOTE - Encoded Nominal and Continuous), in which, nominal features are encoded as numeric values and the difference between two such numeric value reflects the amount of change of association with minority class. Our experiments show that the classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pwc-1/Paper-9/tree/main/7/SMOTE
mindspore

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.