Robust performance metrics for imbalanced classification problems

Hajo Holzmann; Bernhard Klar

arXiv:2404.07661·stat.ML·April 12, 2024·1 cites

Robust performance metrics for imbalanced classification problems

Hajo Holzmann, Bernhard Klar

PDF

Open Access

TL;DR

This paper reveals that common binary classification metrics are biased against minority classes in imbalanced settings and proposes robust modifications that maintain meaningful TPR levels.

Contribution

It introduces new robust performance metrics for imbalanced classification that prevent the TPR from approaching zero as class imbalance increases.

Findings

01

Traditional metrics favor classifiers ignoring the minority class.

02

Robust metrics keep TPR bounded away from zero in highly imbalanced scenarios.

03

Numerical experiments demonstrate improved metric behavior on real and simulated data.

Abstract

We show that established performance metrics in binary classification, such as the F-score, the Jaccard similarity coefficient or Matthews' correlation coefficient (MCC), are not robust to class imbalance in the sense that if the proportion of the minority class tends to $0$ , the true positive rate (TPR) of the Bayes classifier under these metrics tends to $0$ as well. Thus, in imbalanced classification problems, these metrics favour classifiers which ignore the minority class. To alleviate this issue we introduce robust modifications of the F-score and the MCC for which, even in strongly imbalanced settings, the TPR is bounded away from $0$ . We numerically illustrate the behaviour of the various performance metrics in simulations as well as on a credit default data set. We also discuss connections to the ROC and precision-recall curves and give recommendations on how to combine their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Financial Distress and Bankruptcy Prediction · Advanced Statistical Methods and Models