It's About Time: What A/B Test Metrics Estimate

Sebastian Ankargren; Mattias Fr{\aa}nberg; M{\aa}rten Schultzberg

arXiv:2411.06150·stat.ME·November 12, 2024

It's About Time: What A/B Test Metrics Estimate

Sebastian Ankargren, Mattias Fr{\aa}nberg, M{\aa}rten Schultzberg

PDF

Open Access

TL;DR

This paper analyzes the estimands of cumulative versus windowed metrics in online A/B tests, revealing trade-offs in statistical power, timing, and interpretability based on experiment context.

Contribution

It provides a detailed comparison of cumulative and windowed metrics, highlighting their respective advantages, limitations, and implications for experimental design in digital environments.

Findings

01

Cumulative metrics can lead to decreased power with more data due to complex estimands.

02

Cumulative metrics provide earlier detection of effects and quick signals.

03

Neither metric type is universally better; choice depends on experiment specifics.

Abstract

Online controlled experiments, or A/B tests, are large-scale randomized trials in digital environments. This paper investigates the estimands of the difference-in-means estimator in these experiments, focusing on scenarios with repeated measurements on users. We compare cumulative metrics that use all post-exposure data for each user to windowed metrics that measure each user over a fixed time window. We analyze the estimands and highlight trade-offs between the two types of metrics. Our findings reveal that while cumulative metrics eliminate the need for pre-defined measurement windows, they result in estimands that are more intricately tied to the experiment intake and runtime. This complexity can lead to counter-intuitive practical consequences, such as decreased statistical power with more observations. However, cumulative metrics offer earlier results and can quickly detect strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Mobile Crowdsensing and Crowdsourcing · Survey Methodology and Nonresponse