Valid sequential inference on probability forecast performance

Alexander Henzi; Johanna F. Ziegel

arXiv:2103.08402·stat.ME·July 4, 2022

Valid sequential inference on probability forecast performance

Alexander Henzi, Johanna F. Ziegel

PDF

2 Repos

TL;DR

This paper introduces e-values for sequentially testing the significance of differences in probability forecast performance, allowing valid inference with optional stopping and no assumptions on data distribution.

Contribution

It develops finite-sample valid e-values for comparing forecast scores in sequential settings, an improvement over traditional p-value methods.

Findings

01

E-values provide valid sequential testing without assumptions.

02

E-values and traditional tests agree in precipitation forecast case study.

03

Method allows optional stopping without invalidating results.

Abstract

Probability forecasts for binary events play a central role in many applications. Their quality is commonly assessed with proper scoring rules, which assign forecasts a numerical score such that a correct forecast achieves a minimal expected score. In this paper, we construct e-values for testing the statistical significance of score differences of competing forecasts in sequential settings. E-values have been proposed as an alternative to p-values for hypothesis testing, and they can easily be transformed into conservative p-values by taking the multiplicative inverse. The e-values proposed in this article are valid in finite samples without any assumptions on the data generating processes. They also allow optional stopping, so a forecast user may decide to interrupt evaluation taking into account the available data at any time and still draw statistically valid inference, which is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.