Area under the ROC Curve has the Most Consistent Evaluation for Binary   Classification

Jing Li

arXiv:2408.10193·stat.ML·December 17, 2024

Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification

Jing Li

PDF

Open Access

TL;DR

This study shows that the Area Under the ROC Curve (AUC) provides the most consistent evaluation of binary classification models across different data prevalences, outperforming other metrics in stability and ranking consistency.

Contribution

It demonstrates that AUC, which considers all decision thresholds, offers the most stable and reliable evaluation metric across varying data prevalences in binary classification.

Findings

01

AUC has the smallest variance in evaluating models across prevalence changes.

02

Metrics less influenced by prevalence provide more consistent model rankings.

03

Considering all decision thresholds reduces evaluation variance.

Abstract

The proper use of model evaluation metrics is important for model evaluation and model selection in binary classification tasks. This study investigates how consistent different metrics are at evaluating models across data of different prevalence while the relationships between different variables and the sample size are kept constant. Analyzing 156 data scenarios, 18 model evaluation metrics and five commonly used machine learning models as well as a naive random guess model, I find that evaluation metrics that are less influenced by prevalence offer more consistent evaluation of individual models and more consistent ranking of a set of models. In particular, Area Under the ROC Curve (AUC) which takes all decision thresholds into account when evaluating models has the smallest variance in evaluating individual models and smallest variance in ranking of a set of models. A close…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare · Imbalanced Data Classification Techniques

MethodsSparse Evolutionary Training