Woodroofe's one-armed bandit problem revisited

Alexander Goldenshluger; Assaf Zeevi

arXiv:0909.0119·math.PR·September 2, 2009

Woodroofe's one-armed bandit problem revisited

Alexander Goldenshluger, Assaf Zeevi

PDF

TL;DR

This paper revisits Woodroofe's one-armed bandit problem, analyzing minimax strategies with covariates, and demonstrates how regret and sampling rates depend on covariate distribution properties.

Contribution

It develops rate-optimal policies for the problem, extending the classical model to incorporate covariates and analyzing their impact on regret and sampling.

Findings

01

Regret and sampling rates vary with covariate distribution properties.

02

Proposed policies are rate-optimal and modify the myopic rule.

03

Finite or growing regret depending on local covariate characteristics.

Abstract

We consider the one-armed bandit problem of Woodroofe [J. Amer. Statist. Assoc. 74 (1979) 799--806], which involves sequential sampling from two populations: one whose characteristics are known, and one which depends on an unknown parameter and incorporates a covariate. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that involve suitable modifications of the myopic rule. It is shown that the regret, as well as the rate of sampling from the inferior population, can be finite or grow at various rates with the time horizon of the problem, depending on "local" properties of the covariate distribution. Proofs rely on martingale methods and information theoretic arguments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.