Asymptotic Optimality for Decentralised Bandits

Conor Newton; Ayalvadi Ganesh; Henry W. J. Reeve

arXiv:2109.09427·cs.LG·September 21, 2021

Asymptotic Optimality for Decentralised Bandits

Conor Newton, Ayalvadi Ganesh, Henry W. J. Reeve

PDF

TL;DR

This paper introduces a decentralized algorithm for multi-agent multi-armed bandit problems that achieves asymptotic optimality in regret, matching the best possible rates even with communication constraints.

Contribution

It proposes an improved decentralized algorithm building on existing methods, with a theoretical analysis proving asymptotic optimality in regret.

Findings

01

The algorithm achieves asymptotically optimal regret rates.

02

Theoretical analysis confirms optimality under communication constraints.

03

Empirical results support the theoretical claims.

Abstract

We consider a large number of agents collaborating on a multi-armed bandit problem with a large number of arms. The goal is to minimise the regret of each agent in a communication-constrained setting. We present a decentralised algorithm which builds upon and improves the Gossip-Insert-Eliminate method of Chawla et al. arxiv:2001.05452. We provide a theoretical analysis of the regret incurred which shows that our algorithm is asymptotically optimal. In fact, our regret guarantee matches the asymptotically optimal rate achievable in the full communication setting. Finally, we present empirical results which support our conclusions

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.