Collaborative Learning of Stochastic Bandits over a Social Network
Ravi Kumar Kolla, Krishna Jagannathan, Aditya Gopalan

TL;DR
This paper studies how groups of interconnected agents can collaboratively learn to make optimal decisions in a stochastic bandit setting, highlighting the importance of network structure and proposing algorithms that leverage it for improved learning performance.
Contribution
It introduces a collaborative learning framework over social networks, analyzes the limitations of existing policies, and develops network-structure-aware algorithms with theoretical regret bounds.
Findings
Natural extensions of single-agent policies may perform poorly in networks.
Exploiting network structure, especially star motifs, improves learning efficiency.
The proposed algorithms achieve better regret bounds in networked settings.
Abstract
We consider a collaborative online learning paradigm, wherein a group of agents connected through a social network are engaged in playing a stochastic multi-armed bandit game. Each time an agent takes an action, the corresponding reward is instantaneously observed by the agent, as well as its neighbours in the social network. We perform a regret analysis of various policies in this collaborative learning setting. A key finding of this paper is that natural extensions of widely-studied single agent learning policies to the network setting need not perform well in terms of regret. In particular, we identify a class of non-altruistic and individually consistent policies, and argue by deriving regret lower bounds that they are liable to suffer a large regret in the networked setting. We also show that the learning performance can be substantially improved if the agents exploit the structure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
