The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player Multi-Armed Bandits with no Communication
Allen Liu, Mark Sellke

TL;DR
This paper characterizes the fundamental trade-offs in achieving optimal instance-dependent regret in multi-player multi-armed bandits without communication, revealing limitations and proposing a generalized algorithm resilient to adversarial feedback.
Contribution
It provides a complete Pareto frontier of achievable regret guarantees without communication and introduces a new topological lower bound technique.
Findings
Optimal regret guarantees are impossible for some gaps without communication.
The paper characterizes all Pareto optimal trade-offs in regret.
A generalized algorithm achieves these trade-offs even with adversarial feedback.
Abstract
We study the stochastic multi-player multi-armed bandit problem. In this problem, players cooperate to maximize their total reward from arms. However the players cannot communicate and are penalized (e.g. receive no reward) if they pull the same arm at the same time. We ask whether it is possible to obtain optimal instance-dependent regret where is the gap between the -th and -st best arms. Such guarantees were recently achieved in a model allowing the players to implicitly communicate through intentional collisions. Surprisingly, we show that with no communication at all, such guarantees are not achievable. In fact, obtaining the optimal regret for some values of necessarily implies strictly sub-optimal regret in other regimes. Our main result is a complete characterization of the Pareto optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Game Theory and Applications
