Empirical AUC for evaluating probabilistic forecasts

Simon Byrne

arXiv:1508.05503·math.ST·January 31, 2017

Empirical AUC for evaluating probabilistic forecasts

Simon Byrne

PDF

TL;DR

This paper examines the use of empirical AUC as a scoring function for probabilistic forecasts, revealing its limitations in being a proper scoring rule and proposing conditions for its proper use.

Contribution

It analyzes the propriety of the empirical AUC as a scoring function and suggests modifications to ensure proper evaluation of probabilistic forecasts.

Findings

01

AUC is not generally a proper scoring function

02

Modifications can make AUC proper under certain conditions

03

Empirical AUC can be improved by adjusting probabilities

Abstract

Scoring functions are used to evaluate and compare partially probabilistic forecasts. We investigate the use of rank-sum functions such as empirical Area Under the Curve (AUC), a widely-used measure of classification performance, as a scoring function for the prediction of probabilities of a set of binary outcomes. It is shown that the AUC is not generally a proper scoring function, that is, under certain circumstances it is possible to improve on the expected AUC by modifying the quoted probabilities from their true values. However with some restrictions, or with certain modifications, it can be made proper.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.