Social Learning in Multi Agent Multi Armed Bandits
Abishek Sankararaman, Ayalvadi Ganesh, Sanjay Shakkottai

TL;DR
This paper presents a distributed multi-agent algorithm for stochastic multi-armed bandits that reduces regret and communication costs through limited, asynchronous gossip-based communication among agents.
Contribution
It introduces a novel decentralized algorithm enabling agents to collaborate with minimal communication, significantly improving regret and reducing communication complexity.
Findings
Achieves per-agent regret of O((ceil(K/n)+log(n))/Δ * log(T) + log^3(n) log log(n)/Δ^2)
Communicates only Θ(log(T)) times over T rounds per agent
Outperforms non-communicative and fully interactive benchmarks in regret and communication efficiency
Abstract
In this paper, we introduce a distributed version of the classical stochastic Multi-Arm Bandit (MAB) problem. Our setting consists of a large number of agents that collaboratively and simultaneously solve the same instance of armed MAB to minimize the average cumulative regret over all agents. The agents can communicate and collaborate among each other \emph{only} through a pairwise asynchronous gossip based protocol that exchange a limited number of bits. In our model, agents at each point decide on (i) which arm to play, (ii) whether to, and if so (iii) what and whom to communicate with. Agents in our model are decentralized, namely their actions only depend on their observed history in the past. We develop a novel algorithm in which agents, whenever they choose, communicate only arm-ids and not samples, with another agent chosen uniformly and independently at random. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
