Enhancement Encoding: A Novel Imbalanced Classification Approach via   Encoding the Training Labels

Jia-Chen Zhao

arXiv:2208.11056·cs.LG·March 29, 2023

Enhancement Encoding: A Novel Imbalanced Classification Approach via Encoding the Training Labels

Jia-Chen Zhao

PDF

Open Access

TL;DR

This paper introduces enhancement encoding, a novel label encoding method designed specifically for imbalanced classification tasks, which improves minority class performance by combining re-weighting and cost-sensitiveness.

Contribution

It proposes enhancement encoding, a new label encoding technique for imbalanced data, and introduces a soft-confusion matrix to reduce validation costs.

Findings

01

Enhancement encoding significantly improves minority class accuracy.

02

The method outperforms traditional one-hot encoding in imbalanced scenarios.

03

It is effective across different loss functions.

Abstract

Class imbalance, which is also called long-tailed distribution, is a common problem in classification tasks based on machine learning. If it happens, the minority data will be overwhelmed by the majority, which presents quite a challenge for data science. To address the class imbalance problem, researchers have proposed lots of methods: some people make the data set balanced (SMOTE), some others refine the loss function (Focal Loss), and even someone has noticed the value of labels influences class-imbalanced learning (Yang and Xu. Rethinking the value of labels for improving class-imbalanced learning. In NeurIPS 2020), but no one changes the way to encode the labels of data yet. Nowadays, the most prevailing technique to encode labels is the one-hot encoding due to its nice performance in the general situation. However, it is not a good choice for imbalanced data, because the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Currency Recognition and Detection · Electricity Theft Detection Techniques