Batched Bandits with Crowd Externalities
Romain Laroche, Othmane Safsafi, Raphael Feraud, Nicolas Broutin

TL;DR
This paper introduces a new variant of Batched Multi-Armed Bandits where the data received per batch influences the timing of policy updates, and proposes algorithms with provable regret bounds for this setting.
Contribution
The paper formulates a novel BMAB setting with crowd-dependent data, and develops algorithms with theoretical regret guarantees for this scenario.
Findings
Proposed a near-optimal policy with regret $ ilde{O}(rac{1}{ oot{x}})$.
Designed a UCB-inspired algorithm with regret $ ilde{O}( oot{T})$.
Proved regret bounds depend on crowd size and horizon.
Abstract
In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be updated at each time step. Usually, the setting asserts a maximum number of allowed policy updates and the algorithm schedules them so that to minimize the expected regret. In this paper, we describe a novel setting for BMAB, with the following twist: the timing of the policy update is not controlled by the BMAB algorithm, but instead the amount of data received during each batch, called \textit{crowd}, is influenced by the past selection of arms. We first design a near-optimal policy with approximate knowledge of the parameters that we prove to have a regret in where is the size of the crowd and is the parameter error. Next, we implement a UCB-inspired algorithm that guarantees an additional regret in ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Mobile Crowdsensing and Crowdsourcing
