Distributed Thompson Sampling

Jing Dong; Tan Li; Shaolei Ren; Linqi Song

arXiv:2012.01789·cs.AI·September 10, 2021

Distributed Thompson Sampling

Jing Dong, Tan Li, Shaolei Ren, Linqi Song

PDF

Open Access

TL;DR

This paper introduces a distributed Thompson Sampling algorithm for multi-agent multi-armed bandits, demonstrating how communication and collaborative elimination can reduce cumulative regret in a multi-agent setting.

Contribution

It proposes a novel distributed Elimination based Thompson Sampling algorithm that leverages communication among agents to improve regret bounds.

Findings

01

Communication reduces the upper bound of regret.

02

The proposed algorithm achieves better performance with collaborative learning.

03

Derived a problem-dependent upper bound on cumulative regret.

Abstract

We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms