Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal   Individual Regret and Constant Communication Costs

Lin Yang; Xuchuang Wang; Mohammad Hajiesmaili; Lijun Zhang; John C.S.; Lui; Don Towsley

arXiv:2308.04314·cs.LG·August 9, 2023

Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Constant Communication Costs

Lin Yang, Xuchuang Wang, Mohammad Hajiesmaili, Lijun Zhang, John C.S., Lui, Don Towsley

PDF

Open Access

TL;DR

This paper introduces a new cooperative multi-agent bandit algorithm that achieves both optimal individual regret and constant communication costs, combining advantages of existing paradigms.

Contribution

It proposes a simple communication policy integrated into a learning algorithm, achieving the best of leader-follower and fully distributed approaches.

Findings

01

Achieves optimal individual regret.

02

Maintains constant communication costs.

03

Outperforms prior algorithms in combined metrics.

Abstract

Recently, there has been extensive study of cooperative multi-agent multi-armed bandits where a set of distributed agents cooperatively play the same multi-armed bandit game. The goal is to develop bandit algorithms with the optimal group and individual regrets and low communication between agents. The prior work tackled this problem using two paradigms: leader-follower and fully distributed algorithms. Prior algorithms in both paradigms achieve the optimal group regret. The leader-follower algorithms achieve constant communication costs but fail to achieve optimal individual regrets. The state-of-the-art fully distributed algorithms achieve optimal individual regrets but fail to achieve constant communication costs. This paper presents a simple yet effective communication policy and integrates it into a learning algorithm for cooperative bandits. Our algorithm achieves the best of both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Privacy-Preserving Technologies in Data · Auction Theory and Applications

Methodsfail