A study on cost behaviors of binary classification measures in class-imbalanced problems
Bao-Gang Hu, Wei-Ming Dong

TL;DR
This paper analyzes the cost behaviors of twelve binary classification measures in class-imbalanced problems, revealing their suitability based on theoretical cost functions related to class imbalance ratios.
Contribution
It introduces a new perspective by deriving cost functions for various measures, explaining their effectiveness or ineffectiveness in imbalanced classification tasks.
Findings
G-means of accuracy rates and BER are suitable measures.
F1, G-means of recall and precision, MCC, and Kappa are unsuitable.
Cost functions explain why some measures handle class imbalance better.
Abstract
This work investigates into cost behaviors of binary classification measures in a background of class-imbalanced problems. Twelve performance measures are studied, such as F measure, G-means in terms of accuracy rates, and of recall and precision, balance error rate (BER), Matthews correlation coefficient (MCC), Kappa coefficient, etc. A new perspective is presented for those measures by revealing their cost functions with respect to the class imbalance ratio. Basically, they are described by four types of cost functions. The functions provides a theoretical understanding why some measures are suitable for dealing with class-imbalanced problems. Based on their cost functions, we are able to conclude that G-means of accuracy rates and BER are suitable measures because they show "proper" cost behaviors in terms of "a misclassification from a small class will cause a greater cost than that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Financial Distress and Bankruptcy Prediction · Text and Document Classification Technologies
