The MCC-F1 curve: a performance evaluation technique for binary   classification

Chang Cao; Davide Chicco; Michael M. Hoffman

arXiv:2006.11278·stat.ML·June 23, 2020·48 cites

The MCC-F1 curve: a performance evaluation technique for binary classification

Chang Cao, Davide Chicco, Michael M. Hoffman

PDF

Open Access 3 Repos

TL;DR

This paper introduces the MCC-F1 curve, a new performance evaluation method for binary classifiers that addresses limitations of ROC and PR curves, especially in imbalanced datasets, by combining MCC and F1 score.

Contribution

The paper proposes the MCC-F1 curve and metric, offering a clearer and more reliable evaluation of classifiers across thresholds, with an accompanying R package.

Findings

01

MCC-F1 curve better differentiates classifier quality in imbalanced data.

02

The MCC-F1 metric summarizes performance across thresholds.

03

The R package facilitates practical application of the method.

Abstract

Many fields use the ROC curve and the PR curve as standard evaluations of binary classification methods. Analysis of ROC and PR, however, often gives misleading and inflated performance evaluations, especially with an imbalanced ground truth. Here, we demonstrate the problems with ROC and PR analysis through simulations, and propose the MCC-F1 curve to address these drawbacks. The MCC-F1 curve combines two informative single-threshold metrics, MCC and the F1 score. The MCC-F1 curve more clearly differentiates good and bad classifiers, even with imbalanced ground truths. We also introduce the MCC-F1 metric, which provides a single value that integrates many aspects of classifier performance across the whole range of classification thresholds. Finally, we provide an R package that plots MCC-F1 curves and calculates related metrics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Data Mining Algorithms and Applications · Anomaly Detection Techniques and Applications