Batched Bandits with Crowd Externalities

Romain Laroche; Othmane Safsafi; Raphael Feraud; Nicolas Broutin

arXiv:2109.14733·cs.LG·October 1, 2021

Batched Bandits with Crowd Externalities

Romain Laroche, Othmane Safsafi, Raphael Feraud, Nicolas Broutin

PDF

Open Access

TL;DR

This paper introduces a new variant of Batched Multi-Armed Bandits where the data received per batch influences the timing of policy updates, and proposes algorithms with provable regret bounds for this setting.

Contribution

The paper formulates a novel BMAB setting with crowd-dependent data, and develops algorithms with theoretical regret guarantees for this scenario.

Findings

01

Proposed a near-optimal policy with regret $ ilde{O}(rac{1}{ oot{x}})$.

02

Designed a UCB-inspired algorithm with regret $ ilde{O}( oot{T})$.

03

Proved regret bounds depend on crowd size and horizon.

Abstract

In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be updated at each time step. Usually, the setting asserts a maximum number of allowed policy updates and the algorithm schedules them so that to minimize the expected regret. In this paper, we describe a novel setting for BMAB, with the following twist: the timing of the policy update is not controlled by the BMAB algorithm, but instead the amount of data received during each batch, called \textit{crowd}, is influenced by the past selection of arms. We first design a near-optimal policy with approximate knowledge of the parameters that we prove to have a regret in $O (\frac{l n x}{x} + ϵ)$ where $x$ is the size of the crowd and $ϵ$ is the parameter error. Next, we implement a UCB-inspired algorithm that guarantees an additional regret in $O (max (K ln T, T ln T))$ ,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Mobile Crowdsensing and Crowdsourcing