Best Arm Identification with Contextual Information under a Small Gap
Masahiro Kato, Masaaki Imaizumi, Takuya Ishihara, Toru, Kitagawa

TL;DR
This paper investigates the problem of identifying the best treatment arm in a contextual bandit setting with a fixed budget, focusing on the challenging small-gap scenario, and proposes an asymptotically optimal strategy.
Contribution
It derives lower bounds for misidentification probability in small-gap regimes and introduces a novel RS-AIPW strategy that achieves asymptotic optimality.
Findings
Lower bounds for misidentification probability established.
RS-AIPW strategy matches the lower bounds asymptotically.
Strategy effectively handles small-gap regimes in contextual bandits.
Abstract
We study the best-arm identification (BAI) problem with a fixed budget and contextual (covariate) information. In each round of an adaptive experiment, after observing contextual information, we choose a treatment arm using past observations and current context. Our goal is to identify the best treatment arm, which is a treatment arm with the maximal expected reward marginalized over the contextual distribution, with a minimal probability of misidentification. In this study, we consider a class of nonparametric bandit models that converge to location-shift models when the gaps go to zero. First, we derive lower bounds of the misidentification probability for a certain class of strategies and bandit models (probabilistic models of potential outcomes) under a small-gap regime. A small-gap regime is a situation where gaps of the expected rewards between the best and suboptimal treatment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Advanced Causal Inference Techniques · Economic Policies and Impacts
