Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without
S\'ebastien Bubeck, Yuanzhi Li, Yuval Peres, Mark Sellke

TL;DR
This paper studies a cooperative multi-player multi-armed bandit problem without communication, providing the first optimal regret bounds with collision information and sublinear bounds without it, advancing understanding of non-stochastic bandits.
Contribution
It establishes the first $ ilde{O}( oot{T} ext{})$ regret bounds with collision info and sublinear regret without collision info in non-stochastic multi-player bandits.
Findings
Achieved $ ilde{O}( oot{T} ext{})$ regret with collision information.
Established $T^{1-rac{1}{2m}}$ regret without collision info.
First to provide regret guarantees in non-stochastic multi-player bandits.
Abstract
We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. The model assumes no communication at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss. We prove the first -type regret guarantee for this problem, under the feedback model where collisions are announced to the colliding players. Such a bound was not known even for the simpler stochastic version. We also prove the first sublinear guarantee for the feedback model where collision information is not available, namely where is the number of players.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
