Cooperative Multi-Agent Constrained Stochastic Linear Bandits
Amirhossein Afsharrad, Parisa Oftadeh, Ahmadreza Moradipari, Sanjay, Lall

TL;DR
This paper introduces a distributed algorithm for multi-agent stochastic linear bandits that collaboratively minimize regret while respecting cost constraints, leveraging local communication and consensus.
Contribution
It proposes MA-OPLB, a safe distributed UCB algorithm with theoretical regret bounds and practical effectiveness in networked multi-agent settings.
Findings
Regret bound of order O((d/(τ-c₀))(log(NT)²/√N)√(T/|λ₂|)) established.
Algorithm performs well across different network structures.
Effective consensus-based estimation improves multi-agent bandit performance.
Abstract
In this study, we explore a collaborative multi-agent stochastic linear bandit setting involving a network of agents that communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold . Each agent encounters a distinct linear bandit problem characterized by its own reward and cost parameters, i.e., local parameters. The goal of the agents is to determine the best overall action corresponding to the average of these parameters, or so-called global parameters. In each round, an agent is randomly chosen to select an action based on its current knowledge of the system. This chosen action is then executed by all agents, then they observe their individual rewards and costs. We propose a safe distributed upper confidence bound algorithm, so called \textit{MA-OPLB}, and establish a high probability bound on its -round regret.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Data Stream Mining Techniques
