Bayesian Algorithms for Decentralized Stochastic Bandits

Anusha Lalitha; Andrea Goldsmith

arXiv:2010.10569·stat.ML·October 29, 2020

Bayesian Algorithms for Decentralized Stochastic Bandits

Anusha Lalitha, Andrea Goldsmith

PDF

1 Repo

TL;DR

This paper introduces decentralized Bayesian algorithms for multi-agent multi-armed bandit problems, achieving near-optimal regret bounds and outperforming existing algorithms through extensive numerical validation.

Contribution

It extends Bayesian bandit algorithms to decentralized multi-agent settings with network communication, providing theoretical regret bounds and practical algorithms.

Findings

01

Decentralized Thompson Sampling matches centralized regret bounds.

02

Regret scales logarithmically with time, influenced by network structure.

03

Proposed algorithms outperform state-of-the-art UCB-inspired methods.

Abstract

We study a decentralized cooperative multi-agent multi-armed bandit problem with $K$ arms and $N$ agents connected over a network. In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently across agents and over time steps. In each round, agents choose an arm to play and subsequently send a message to their neighbors. The goal is to minimize cumulative regret averaged over the entire network. We propose a decentralized Bayesian multi-armed bandit framework that extends single-agent Bayesian bandit algorithms to the decentralized setting. Specifically, we study an information assimilation algorithm that can be combined with existing Bayesian algorithms, and using this, we propose a decentralized Thompson Sampling algorithm and decentralized Bayes-UCB algorithm. We analyze the decentralized Thompson Sampling algorithm under Bernoulli rewards…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anushalalitha5/Decentralized-Thompson-Sampling
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.