Multiplayer bandits without observing collision information

Gabor Lugosi; Abbas Mehrabian

arXiv:1808.08416·cs.LG·April 6, 2021

Multiplayer bandits without observing collision information

Gabor Lugosi, Abbas Mehrabian

PDF

TL;DR

This paper investigates multiplayer bandit problems without collision observation, providing the first theoretical regret bounds for such settings and proposing algorithms for equilibrium approximation in anti-coordination games.

Contribution

It introduces the first regret guarantees for multiplayer bandits without collision feedback and develops algorithms for approximate Nash equilibria in anti-coordination games.

Findings

01

Logarithmic regret algorithm for no-collision feedback model

02

Square-root regret bounds independent of mean gaps

03

Fast convergence to approximate Nash equilibria

Abstract

We study multiplayer stochastic multi-armed bandit problems in which the players cannot communicate and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider two feedback models: a model in which the players can observe whether a collision has occurred and a more difficult setup when no collision information is available. We give the first theoretical guarantees for the second model: an algorithm with a logarithmic regret, and an algorithm with a square-root regret type that does not depend on the gaps between the means. For the first model, we give the first square-root regret bounds that do not depend on the gaps. Building on these ideas, we also give an algorithm for reaching approximate Nash equilibria quickly in stochastic anti-coordination games.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.