Cooperative Multi-Agent Constrained Stochastic Linear Bandits

Amirhossein Afsharrad; Parisa Oftadeh; Ahmadreza Moradipari; Sanjay; Lall

arXiv:2410.17382·cs.LG·October 24, 2024

Cooperative Multi-Agent Constrained Stochastic Linear Bandits

Amirhossein Afsharrad, Parisa Oftadeh, Ahmadreza Moradipari, Sanjay, Lall

PDF

Open Access

TL;DR

This paper introduces a distributed algorithm for multi-agent stochastic linear bandits that collaboratively minimize regret while respecting cost constraints, leveraging local communication and consensus.

Contribution

It proposes MA-OPLB, a safe distributed UCB algorithm with theoretical regret bounds and practical effectiveness in networked multi-agent settings.

Findings

01

Regret bound of order O((d/(τ-c₀))(log(NT)²/√N)√(T/|λ₂|)) established.

02

Algorithm performs well across different network structures.

03

Effective consensus-based estimation improves multi-agent bandit performance.

Abstract

In this study, we explore a collaborative multi-agent stochastic linear bandit setting involving a network of $N$ agents that communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold $τ$ . Each agent encounters a distinct linear bandit problem characterized by its own reward and cost parameters, i.e., local parameters. The goal of the agents is to determine the best overall action corresponding to the average of these parameters, or so-called global parameters. In each round, an agent is randomly chosen to select an action based on its current knowledge of the system. This chosen action is then executed by all agents, then they observe their individual rewards and costs. We propose a safe distributed upper confidence bound algorithm, so called \textit{MA-OPLB}, and establish a high probability bound on its $T$ -round regret.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Data Stream Mining Techniques