TL;DR
This paper introduces decentralized Bayesian algorithms for multi-agent multi-armed bandit problems, achieving near-optimal regret bounds and outperforming existing algorithms through extensive numerical validation.
Contribution
It extends Bayesian bandit algorithms to decentralized multi-agent settings with network communication, providing theoretical regret bounds and practical algorithms.
Findings
Decentralized Thompson Sampling matches centralized regret bounds.
Regret scales logarithmically with time, influenced by network structure.
Proposed algorithms outperform state-of-the-art UCB-inspired methods.
Abstract
We study a decentralized cooperative multi-agent multi-armed bandit problem with arms and agents connected over a network. In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently across agents and over time steps. In each round, agents choose an arm to play and subsequently send a message to their neighbors. The goal is to minimize cumulative regret averaged over the entire network. We propose a decentralized Bayesian multi-armed bandit framework that extends single-agent Bayesian bandit algorithms to the decentralized setting. Specifically, we study an information assimilation algorithm that can be combined with existing Bayesian algorithms, and using this, we propose a decentralized Thompson Sampling algorithm and decentralized Bayes-UCB algorithm. We analyze the decentralized Thompson Sampling algorithm under Bernoulli rewards…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
