Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With   Collision Information, Sublinear Without

S\'ebastien Bubeck; Yuanzhi Li; Yuval Peres; Mark Sellke

arXiv:1904.12233·cs.LG·May 3, 2019·5 cites

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

S\'ebastien Bubeck, Yuanzhi Li, Yuval Peres, Mark Sellke

PDF

Open Access

TL;DR

This paper studies a cooperative multi-player multi-armed bandit problem without communication, providing the first optimal regret bounds with collision information and sublinear bounds without it, advancing understanding of non-stochastic bandits.

Contribution

It establishes the first $ ilde{O}( oot{T} ext{})$ regret bounds with collision info and sublinear regret without collision info in non-stochastic multi-player bandits.

Findings

01

Achieved $ ilde{O}( oot{T} ext{})$ regret with collision information.

02

Established $T^{1-rac{1}{2m}}$ regret without collision info.

03

First to provide regret guarantees in non-stochastic multi-player bandits.

Abstract

We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. The model assumes no communication at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss. We prove the first $T$ -type regret guarantee for this problem, under the feedback model where collisions are announced to the colliding players. Such a bound was not known even for the simpler stochastic version. We also prove the first sublinear guarantee for the feedback model where collision information is not available, namely $T^{1 - \frac{1}{2 m}}$ where $m$ is the number of players.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems