Asymptotic Optimality for Decentralised Bandits
Conor Newton, Ayalvadi Ganesh, Henry W. J. Reeve

TL;DR
This paper introduces a decentralized algorithm for multi-agent multi-armed bandit problems that achieves asymptotic optimality in regret, matching the best possible rates even with communication constraints.
Contribution
It proposes an improved decentralized algorithm building on existing methods, with a theoretical analysis proving asymptotic optimality in regret.
Findings
The algorithm achieves asymptotically optimal regret rates.
Theoretical analysis confirms optimality under communication constraints.
Empirical results support the theoretical claims.
Abstract
We consider a large number of agents collaborating on a multi-armed bandit problem with a large number of arms. The goal is to minimise the regret of each agent in a communication-constrained setting. We present a decentralised algorithm which builds upon and improves the Gossip-Insert-Eliminate method of Chawla et al. arxiv:2001.05452. We provide a theoretical analysis of the regret incurred which shows that our algorithm is asymptotically optimal. In fact, our regret guarantee matches the asymptotically optimal rate achievable in the full communication setting. Finally, we present empirical results which support our conclusions
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
