YEAST: Yet Another Sequential Test

Alexey Kurennoy; Majed Dodin; Tural Gurbanov; Ana Peleteiro Ramallo

arXiv:2406.16523·stat.ME·October 8, 2025

YEAST: Yet Another Sequential Test

Alexey Kurennoy, Majed Dodin, Tural Gurbanov, Ana Peleteiro Ramallo

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces YEAST, a new sequential testing method for online A/B experiment evaluation that allows continuous monitoring with improved power and flexibility over existing approaches.

Contribution

YEAST is a novel sequential test that enables continuous monitoring of A/B experiments, overcoming limitations of previous methods in interim analysis and statistical power.

Findings

01

Outperforms existing sequential testing methods in simulations

02

Allows continuous monitoring without increasing false discovery risk

03

Validated through semi-synthetic experiments

Abstract

Online evaluation of machine learning models is typically conducted through A/B experiments. Sequential statistical tests are valuable tools for analysing these experiments, as they enable researchers to stop data collection early without increasing the risk of false discoveries. However, existing sequential tests either limit the number of interim analyses or suffer from low statistical power. In this paper, we introduce a novel sequential test designed for continuous monitoring of A/B experiments. We validate our method using semi-synthetic simulations and demonstrate that it outperforms current state-of-the-art sequential testing approaches. Our method is derived using a new technique that inverts a bound on the probability of threshold crossing, based on a classical maximal inequality.

Peer Reviews

Decision·NeurIPS 2025 poster

Reviewer 01Rating 5Confidence 4

Strengths

**Strengths:** - The idea is short and clear/simple - YEAST has the capacity to significantly outperform current/recent methods in some settings. Meanwhile, it is not expensive in terms of parameters - Sequential Testing is a topic that is current **Negatives:** - YEAST requires a finite horizon N, whereas another method like GAVI makes no assumptions on the stopping rule. This requirement by YEAST makes it so that it's less straightforward to claim that YEAST is superior to GAVI. - The semi

Reviewer 02Rating 5Confidence 3

Strengths

The paper is written very clearly and the exposition is very easy to follow. In particular, I really appreciated the informal introduction of the method in (6) and (7) before the formal proof in Theorem 1. The method requires estimating V_N (the normalized variation of the metric being monitored). This is a limiting factor of the method but the authors have a nice discussion regarding the impact of the estimation error on the Type-I error based on their Theorem 1. My main concern is the novelt

Reviewer 03Rating 2Confidence 3

Strengths

On the positive side, the presentation of the paper is clear and well-structured, making the proposed method and its derivation accessible. Additionally, the empirical results presented in Section 4 show that YEAST achieves accurate false detection rate control and high power. On the negative side, while Theorems 1 and 2 provide theoretical guarantees for a sufficiently large monitoring period N, the paper does not determine what constitutes a sufficiently large N in real-world scenarios. This

Videos

YEAST: Yet Another Sequential Test· slideslive

Taxonomy

TopicsEffects of Environmental Stressors on Livestock