Social Learning in Multi Agent Multi Armed Bandits

Abishek Sankararaman; Ayalvadi Ganesh; Sanjay Shakkottai

arXiv:1910.02100·cs.LG·November 6, 2019

Social Learning in Multi Agent Multi Armed Bandits

Abishek Sankararaman, Ayalvadi Ganesh, Sanjay Shakkottai

PDF

TL;DR

This paper presents a distributed multi-agent algorithm for stochastic multi-armed bandits that reduces regret and communication costs through limited, asynchronous gossip-based communication among agents.

Contribution

It introduces a novel decentralized algorithm enabling agents to collaborate with minimal communication, significantly improving regret and reducing communication complexity.

Findings

01

Achieves per-agent regret of O((ceil(K/n)+log(n))/Δ * log(T) + log^3(n) log log(n)/Δ^2)

02

Communicates only Θ(log(T)) times over T rounds per agent

03

Outperforms non-communicative and fully interactive benchmarks in regret and communication efficiency

Abstract

In this paper, we introduce a distributed version of the classical stochastic Multi-Arm Bandit (MAB) problem. Our setting consists of a large number of agents $n$ that collaboratively and simultaneously solve the same instance of $K$ armed MAB to minimize the average cumulative regret over all agents. The agents can communicate and collaborate among each other \emph{only} through a pairwise asynchronous gossip based protocol that exchange a limited number of bits. In our model, agents at each point decide on (i) which arm to play, (ii) whether to, and if so (iii) what and whom to communicate with. Agents in our model are decentralized, namely their actions only depend on their observed history in the past. We develop a novel algorithm in which agents, whenever they choose, communicate only arm-ids and not samples, with another agent chosen uniformly and independently at random. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.