Collaborative Multi-agent Stochastic Linear Bandits

Ahmadreza Moradipari; Mohammad Ghavamzadeh; and Mahnoosh Alizadeh

arXiv:2205.06331·cs.LG·May 16, 2022

Collaborative Multi-agent Stochastic Linear Bandits

Ahmadreza Moradipari, Mohammad Ghavamzadeh, and Mahnoosh Alizadeh

PDF

Open Access

TL;DR

This paper introduces a distributed UCB algorithm for multi-agent stochastic linear bandits, leveraging local communication to minimize regret, with a proven high-probability regret bound that accounts for communication costs.

Contribution

It proposes a novel distributed UCB algorithm for multi-agent linear bandits with theoretical regret guarantees considering communication overhead.

Findings

01

Regret bound of order O(√(T/(N log(1|λ₂|))) (log T)^2)

02

Effective consensus-based reward estimation among agents

03

Communication rounds impact on regret growth

Abstract

We study a collaborative multi-agent stochastic linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret. In this setting, each agent has its own linear bandit problem (its own reward parameter) and the goal is to select the best global action w.r.t. the average of their reward parameters. At each round, each agent proposes an action, and one action is randomly selected and played as the network action. All the agents observe the corresponding rewards of the played actions and use an accelerated consensus procedure to compute an estimate of the average of the rewards obtained by all the agents. We propose a distributed upper confidence bound (UCB) algorithm and prove a high probability bound on its $T$ -round regret in which we include a linear growth of regret associated with each communication round. Our regret bound is of order…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Stochastic Gradient Optimization Techniques