Distributed Consensus Algorithm for Decision-Making in Multi-agent   Multi-armed Bandit

Xiaotong Cheng; Setareh Maghsudi

arXiv:2306.05998·cs.LG·June 12, 2023·1 cites

Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit

Xiaotong Cheng, Setareh Maghsudi

PDF

Open Access

TL;DR

This paper introduces RBO-Coop-UCB, a multi-agent bandit algorithm with Bayesian change detection, designed for dynamic environments with unknown change points, achieving lower regret and outperforming existing methods.

Contribution

It develops a novel multi-agent UCB algorithm integrated with Bayesian change point detection and a cooperative restart strategy for dynamic multi-armed bandit problems.

Findings

01

Theoretical regret bound of $ ilde{O}(KNM ext{log} T + K ext{sqrt{MT ext{log} T})$.

02

Outperforms state-of-the-art algorithms on synthetic and real datasets.

03

Effective in environments with multiple change points and shared information structures.

Abstract

We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment. A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown change points. The agents face the identical piecewise-stationary MAB problem. The goal is to develop a decision-making policy for the agents that minimizes the regret, which is the expected total loss of not playing the optimal arm at each time step. Our proposed solution, Restarted Bayesian Online Change Point Detection in Cooperative Upper Confidence Bound Algorithm (RBO-Coop-UCB), involves an efficient multi-agent UCB algorithm as its core enhanced with a Bayesian change point detector. We also develop a simple restart decision cooperation that improves decision-making. Theoretically, we establish that the expected group regret of RBO-Coop-UCB is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Optimization and Search Problems