Online Multi-Armed Bandit
Uma Roy, Ashwath Thirmulai, Joe Zurier

TL;DR
This paper introduces a new online multi-armed bandit problem where each bandit can only be visited once, studies strategies for Bernoulli bandits with unknown means, and proves their near-optimality under various distributional assumptions.
Contribution
It formulates a novel single-visit bandit problem, proposes optimal strategies, and provides bounds on their performance, including cases with unknown underlying distributions.
Findings
Proposed strategies are optimal up to a constant factor.
Bounded the performance of any optimal strategy.
Algorithms perform well even with unknown distributional parameters.
Abstract
We introduce a novel variant of the multi-armed bandit problem, in which bandits are streamed one at a time to the player, and at each point, the player can either choose to pull the current bandit or move on to the next bandit. Once a player has moved on from a bandit, they may never visit it again, which is a crucial difference between our problem and classic multi-armed bandit problems. In this online context, we study Bernoulli bandits (bandits with payout Ber() for some underlying mean ) with underlying means drawn i.i.d. from various distributions, including the uniform distribution, and in general, all distributions that have a CDF satisfying certain differentiability conditions near zero. In all cases, we suggest several strategies and investigate their expected performance. Furthermore, we bound the performance of any optimal strategy and show that the strategies we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
