Distributed Thompson Sampling
Jing Dong, Tan Li, Shaolei Ren, Linqi Song

TL;DR
This paper introduces a distributed Thompson Sampling algorithm for multi-agent multi-armed bandits, demonstrating how communication and collaborative elimination can reduce cumulative regret in a multi-agent setting.
Contribution
It proposes a novel distributed Elimination based Thompson Sampling algorithm that leverages communication among agents to improve regret bounds.
Findings
Communication reduces the upper bound of regret.
The proposed algorithm achieves better performance with collaborative learning.
Derived a problem-dependent upper bound on cumulative regret.
Abstract
We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
