A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit
Giuseppe Burtini, Jason Loeppky, Ramon Lawrence

TL;DR
This survey reviews online experiment design using stochastic multi-armed bandits, covering models, complications, and regret bounds to guide future research and practical decision-making.
Contribution
It synthesizes existing research on stochastic multi-armed bandits for online experiments, providing a comprehensive taxonomy and regret bounds table.
Findings
Taxonomy of complications in bandit models
Summary of regret bounds for algorithms
Guidance for future theoretical and practical work
Abstract
Adaptive and sequential experiment design is a well-studied area in numerous domains. We survey and synthesize the work of the online statistical learning paradigm referred to as multi-armed bandits integrating the existing research as a resource for a certain class of online experiments. We first explore the traditional stochastic model of a multi-armed bandit, then explore a taxonomic scheme of complications to that model, for each complication relating it to a specific requirement or consideration of the experiment design context. Finally, at the end of the paper, we present a table of known upper-bounds of regret for all studied algorithms providing both perspectives for future theoretical work and a decision-making tool for practitioners looking for theoretical guarantees.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Advanced Multi-Objective Optimization Algorithms
