Online Testing of Subgroup Treatment Effects Based on Value Difference
Miao Yu, Wenbin Lu, Rui Song

TL;DR
This paper introduces SUBTLE, a sequential testing method for subgroup treatment effects in online A/B testing, addressing false positives and heterogeneity, with proven high detection power and early stopping capabilities.
Contribution
The paper proposes a novel sequential test called SUBTLE that detects beneficial subgroups in online experiments while controlling false positives and allowing continuous monitoring.
Findings
SUBTLE maintains high detection power and controls type I error at any time.
It is more robust to noise covariates than traditional methods.
SUBTLE enables early stopping in experiments, saving resources.
Abstract
Online A/B testing plays a critical role in the high-tech industry to guide product development and accelerate innovation. It performs a null hypothesis statistical test to determine which variant is better. However, a typical A/B test presents two problems: (i) a fixed-horizon framework inflates the false-positive errors under continuous monitoring; (ii) the homogeneous effects assumption fails to identify a subgroup with a beneficial treatment effect. In this paper, we propose a sequential test for subgroup treatment effects based on value difference, named SUBTLE, to address these two problems simultaneously. The SUBTLE allows the experimenters to "peek" at the results during the experiment without harming the statistical guarantees. It assumes heterogeneous treatment effects and aims to test if some subgroup of the population will benefit from the investigative treatment. If the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Advanced Causal Inference Techniques · SARS-CoV-2 detection and testing
