Classifier uncertainty: evidence, potential impact, and probabilistic   treatment

Niklas T\"otsch; Daniel Hoffmann

arXiv:2006.11105·stat.ML·March 5, 2021

Classifier uncertainty: evidence, potential impact, and probabilistic treatment

Niklas T\"otsch, Daniel Hoffmann

PDF

1 Repo

TL;DR

This paper introduces a probabilistic method to quantify the uncertainty in classifier performance metrics based on confusion matrices, revealing that many published results may be misleading due to large uncertainties.

Contribution

It provides a simple, classifier-agnostic approach to assess the uncertainty of performance metrics and aids in sample size estimation for desired precision.

Findings

01

Uncertainties in performance metrics can be surprisingly large.

02

Many published classifiers' results may be misleading due to uncertainty.

03

The method is simple, requiring only the confusion matrix.

Abstract

Classifiers are often tested on relatively small data sets, which should lead to uncertain performance metrics. Nevertheless, these metrics are usually taken at face value. We present an approach to quantify the uncertainty of classification performance metrics, based on a probability model of the confusion matrix. Application of our approach to classifiers from the scientific literature and a classification competition shows that uncertainties can be surprisingly large and limit performance evaluation. In fact, some published classifiers are likely to be misleading. The application of our approach is simple and requires only the confusion matrix. It is agnostic of the underlying classifier. Our method can also be used for the estimation of sample sizes that achieve a desired precision of a performance metric.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

niklastoe/classifier_metric_uncertainty
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.