A Two-armed Bandit Framework for A/B Testing
Jinjuan Wang, Qianglin Wen, Yu Zhang, Xiaodong Yan, Chengchun Shi

TL;DR
This paper introduces a novel two-armed bandit framework for A/B testing that enhances statistical power by combining doubly robust estimation, bandit-based test statistics, and permutation methods, validated through theory, simulations, and real data.
Contribution
It presents a new testing procedure integrating bandit algorithms with causal inference techniques to improve A/B test power over existing methods.
Findings
Demonstrates superior performance in simulations
Shows effectiveness on real ridesharing data
Provides asymptotic theoretical guarantees
Abstract
A/B testing is widely used in modern technology companies for policy evaluation and product deployment, with the goal of comparing the outcomes under a newly-developed policy against a standard control. Various causal inference and reinforcement learning methods developed in the literature are applicable to A/B testing. This paper introduces a two-armed bandit framework designed to improve the power of existing approaches. The proposed procedure consists of three main steps: (i) employing doubly robust estimation to generate pseudo-outcomes, (ii) utilizing a two-armed bandit framework to construct the test statistic, and (iii) applying a permutation-based method to compute the -value. We demonstrate the efficacy of the proposed method through asymptotic theories, numerical experiments and real-world data from a ridesharing company, showing its superior performance in comparison to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning
