An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits
Biyonka Liang, Iavor Bojinov

TL;DR
This paper introduces MAD, a new experimental design for multi-armed bandits that allows for anytime-valid causal inference, enabling early stopping and improved experiment efficiency without sacrificing accuracy.
Contribution
MAD combines bandit algorithms with a Bernoulli design to provide valid, anytime inference on the ATE, balancing regret minimization and inferential precision.
Findings
MAD achieves finite-sample anytime-validity.
MAD accurately estimates the ATE.
MAD maintains reward performance close to standard bandit designs.
Abstract
Experimentation is crucial for managers to rigorously quantify the value of a change and determine if it leads to a statistically significant improvement over the status quo. As companies increasingly mandate that all changes undergo experimentation before widespread release, two challenges arise: (1) minimizing the proportion of customers assigned to the inferior treatment and (2) increasing experimentation velocity by enabling data-dependent stopping. This paper addresses both challenges by introducing the Mixture Adaptive Design (MAD), a new experimental design for multi-armed bandit (MAB) algorithms that enables anytime-valid inference on the Average Treatment Effect (ATE) for \emph{any} MAB algorithm. Intuitively, MAD "mixes" any bandit algorithm with a Bernoulli design, where at each time step, the probability of assigning a unit via the Bernoulli design is determined by a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Machine Learning and Data Classification
