On the bias, risk and consistency of sample means in multi-armed bandits
Jaehyeok Shin, Aaditya Ramdas, Alessandro Rinaldo

TL;DR
This paper systematically analyzes the bias, risk, and consistency of sample means in multi-armed bandit experiments, identifying sources of bias and proposing bounds using an effective sample size, with a nonparametric, algorithm-agnostic approach.
Contribution
It introduces a comprehensive framework for understanding bias and risk in MAB sample means, including new bounds and the concept of effective sample size, independent of specific algorithms.
Findings
Identifies four sources of selection bias in MAB sample means.
Proposes an effective sample size to bound the risk of the sample mean.
Provides nonparametric, algorithm-agnostic analysis with new concentration inequalities.
Abstract
The sample mean is among the most well studied estimators in statistics, having many desirable properties such as unbiasedness and consistency. However, when analyzing data collected using a multi-armed bandit (MAB) experiment, the sample mean is biased and much remains to be understood about its properties. For example, when is it consistent, how large is its bias, and can we bound its mean squared error? This paper delivers a thorough and systematic treatment of the bias, risk and consistency of MAB sample means. Specifically, we identify four distinct sources of selection bias (sampling, stopping, choosing and rewinding) and analyze them both separately and together. We further demonstrate that a new notion of \emph{effective sample size} can be used to bound the risk of the sample mean under suitable loss functions. We present several carefully designed examples to provide intuition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Gaussian Processes and Bayesian Inference
