Evaluating probabilistic classifiers: Reliability diagrams and score   decompositions revisited

Timo Dimitriadis; Tilmann Gneiting; Alexander I. Jordan

arXiv:2008.03033·stat.ME·August 26, 2021

Evaluating probabilistic classifiers: Reliability diagrams and score decompositions revisited

Timo Dimitriadis, Tilmann Gneiting, Alexander I. Jordan

PDF

TL;DR

This paper introduces the CORP approach for creating reliable, stable, and reproducible reliability diagrams for probabilistic classifiers, enhancing calibration assessment with statistical guarantees and uncertainty quantification.

Contribution

The paper presents the CORP method, based on isotonic regression and PAV algorithm, for automated, statistically consistent reliability diagrams with uncertainty measures and generalized score decompositions.

Findings

01

CORP provides stable, reproducible reliability diagrams.

02

It enables uncertainty quantification via resampling or asymptotic theory.

03

Offers a generalized Brier score decomposition for proper scoring rules.

Abstract

A probability forecast or probabilistic classifier is reliable or calibrated if the predicted probabilities are matched by ex post observed frequencies, as examined visually in reliability diagrams. The classical binning and counting approach to plotting reliability diagrams has been hampered by a lack of stability under unavoidable, ad hoc implementation decisions. Here we introduce the CORP approach, which generates provably statistically Consistent, Optimally binned, and Reproducible reliability diagrams in an automated way. CORP is based on non-parametric isotonic regression and implemented via the Pool-adjacent-violators (PAV) algorithm - essentially, the CORP reliability diagram shows the graph of the PAV- (re)calibrated forecast probabilities. The CORP approach allows for uncertainty quantification via either resampling techniques or asymptotic theory, furnishes a new numerical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.