Game of Thrones: Fully Distributed Learning for Multi-Player Bandits
Ilai Bistritz, Amir Leshem

TL;DR
This paper introduces a fully distributed algorithm for multi-player bandit problems where players independently learn to maximize rewards without communication, achieving near-optimal regret bounds in a challenging setting.
Contribution
It presents the first distributed algorithm with near-logarithmic regret for multi-player bandits without communication or identical rewards assumptions.
Findings
Achieves near-O(log T) regret in fully distributed multi-player bandit setting.
First to handle independent reward vectors and no communication among players.
Proves theoretical near-optimal regret bounds for the proposed algorithm.
Abstract
We consider an N-player multi-armed bandit game where each player chooses one out of M arms for T turns. Each player has different expected rewards for the arms, and the instantaneous rewards are independent and identically distributed or Markovian. When two or more players choose the same arm, they all receive zero reward. Performance is measured using the expected sum of regrets, compared with an optimal assignment of arms to players that maximizes the sum of expected rewards. We assume that each player only knows her actions and the reward she received each turn. Players cannot observe the actions of other players, and no communication between players is possible. We present a distributed algorithm and prove that it achieves an expected sum of regrets of near-O\left(\log T\right). This is the first algorithm to achieve a near order optimal regret in this fully distributed scenario.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
