Online Multi-Armed Bandit

Uma Roy; Ashwath Thirmulai; Joe Zurier

arXiv:1707.04987·cs.AI·July 18, 2017

Online Multi-Armed Bandit

Uma Roy, Ashwath Thirmulai, Joe Zurier

PDF

Open Access

TL;DR

This paper introduces a new online multi-armed bandit problem where each bandit can only be visited once, studies strategies for Bernoulli bandits with unknown means, and proves their near-optimality under various distributional assumptions.

Contribution

It formulates a novel single-visit bandit problem, proposes optimal strategies, and provides bounds on their performance, including cases with unknown underlying distributions.

Findings

01

Proposed strategies are optimal up to a constant factor.

02

Bounded the performance of any optimal strategy.

03

Algorithms perform well even with unknown distributional parameters.

Abstract

We introduce a novel variant of the multi-armed bandit problem, in which bandits are streamed one at a time to the player, and at each point, the player can either choose to pull the current bandit or move on to the next bandit. Once a player has moved on from a bandit, they may never visit it again, which is a crucial difference between our problem and classic multi-armed bandit problems. In this online context, we study Bernoulli bandits (bandits with payout Ber( $p_{i}$ ) for some underlying mean $p_{i}$ ) with underlying means drawn i.i.d. from various distributions, including the uniform distribution, and in general, all distributions that have a CDF satisfying certain differentiability conditions near zero. In all cases, we suggest several strategies and investigate their expected performance. Furthermore, we bound the performance of any optimal strategy and show that the strategies we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems