Woodroofe's one-armed bandit problem revisited
Alexander Goldenshluger, Assaf Zeevi

TL;DR
This paper revisits Woodroofe's one-armed bandit problem, analyzing minimax strategies with covariates, and demonstrates how regret and sampling rates depend on covariate distribution properties.
Contribution
It develops rate-optimal policies for the problem, extending the classical model to incorporate covariates and analyzing their impact on regret and sampling.
Findings
Regret and sampling rates vary with covariate distribution properties.
Proposed policies are rate-optimal and modify the myopic rule.
Finite or growing regret depending on local covariate characteristics.
Abstract
We consider the one-armed bandit problem of Woodroofe [J. Amer. Statist. Assoc. 74 (1979) 799--806], which involves sequential sampling from two populations: one whose characteristics are known, and one which depends on an unknown parameter and incorporates a covariate. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that involve suitable modifications of the myopic rule. It is shown that the regret, as well as the rate of sampling from the inferior population, can be finite or grow at various rates with the time horizon of the problem, depending on "local" properties of the covariate distribution. Proofs rely on martingale methods and information theoretic arguments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
