Continuous Testing: Unifying Tests and E-values

Nick W. Koning

arXiv:2409.05654·math.ST·May 12, 2025

Continuous Testing: Unifying Tests and E-values

Nick W. Koning

PDF

Open Access

TL;DR

This paper introduces a unified 'continuous testing' framework that connects e-values with classical hypothesis tests, providing a more general, optimal, and interpretable approach to statistical testing and evidence measurement.

Contribution

It unifies e-values and classical tests into a single continuous testing framework, establishing their theoretical relationship and advantages over traditional p-values.

Findings

01

Continuous tests generalize classical randomized tests.

02

Unified theory includes Neyman-Pearson and log-optimal e-values.

03

Continuous tests provide stronger evidence guarantees than p-values.

Abstract

The e-value is swiftly rising in prominence in many applications of hypothesis testing and multiple testing, yet its relationship to classical testing theory remains elusive. We unify e-values and classical testing into a single 'continuous testing' framework: we argue that e-values are simply the continuous generalization of a test. This cements their foundational role in hypothesis testing. Such continuous tests relate to the rejection probability of classical randomized tests, offering the benefits of randomized tests without the downsides of a randomized decision. By generalizing the traditional notion of power, we obtain a unified theory of optimal continuous testing that nests both classical Neyman-Pearson-optimal tests and log-optimal e-values as special cases. This implies the only difference between typical classical tests and typical e-values is a different choice of power…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReliability and Agreement in Measurement · Meta-analysis and systematic reviews