YEAST: Yet Another Sequential Test
Alexey Kurennoy, Majed Dodin, Tural Gurbanov, Ana Peleteiro Ramallo

TL;DR
This paper introduces YEAST, a new sequential testing method for online A/B experiment evaluation that allows continuous monitoring with improved power and flexibility over existing approaches.
Contribution
YEAST is a novel sequential test that enables continuous monitoring of A/B experiments, overcoming limitations of previous methods in interim analysis and statistical power.
Findings
Outperforms existing sequential testing methods in simulations
Allows continuous monitoring without increasing false discovery risk
Validated through semi-synthetic experiments
Abstract
Online evaluation of machine learning models is typically conducted through A/B experiments. Sequential statistical tests are valuable tools for analysing these experiments, as they enable researchers to stop data collection early without increasing the risk of false discoveries. However, existing sequential tests either limit the number of interim analyses or suffer from low statistical power. In this paper, we introduce a novel sequential test designed for continuous monitoring of A/B experiments. We validate our method using semi-synthetic simulations and demonstrate that it outperforms current state-of-the-art sequential testing approaches. Our method is derived using a new technique that inverts a bound on the probability of threshold crossing, based on a classical maximal inequality.
Peer Reviews
Decision·NeurIPS 2025 poster
**Strengths:** - The idea is short and clear/simple - YEAST has the capacity to significantly outperform current/recent methods in some settings. Meanwhile, it is not expensive in terms of parameters - Sequential Testing is a topic that is current **Negatives:** - YEAST requires a finite horizon N, whereas another method like GAVI makes no assumptions on the stopping rule. This requirement by YEAST makes it so that it's less straightforward to claim that YEAST is superior to GAVI. - The semi
The paper is written very clearly and the exposition is very easy to follow. In particular, I really appreciated the informal introduction of the method in (6) and (7) before the formal proof in Theorem 1. The method requires estimating V_N (the normalized variation of the metric being monitored). This is a limiting factor of the method but the authors have a nice discussion regarding the impact of the estimation error on the Type-I error based on their Theorem 1. My main concern is the novelt
On the positive side, the presentation of the paper is clear and well-structured, making the proposed method and its derivation accessible. Additionally, the empirical results presented in Section 4 show that YEAST achieves accurate false detection rate control and high power. On the negative side, while Theorems 1 and 2 provide theoretical guarantees for a sufficiently large monitoring period N, the paper does not determine what constitutes a sufficiently large N in real-world scenarios. This
Videos
Taxonomy
TopicsEffects of Environmental Stressors on Livestock
