Decentralized Multi-Agent Linear Bandits with Safety Constraints
Sanae Amani, Christos Thrampoulidis

TL;DR
This paper introduces a decentralized algorithm for multi-agent linear bandits that optimizes regret while considering communication costs and safety constraints, applicable to arbitrary network topologies.
Contribution
It proposes DLUCB, a fully decentralized algorithm with near-optimal regret, and RC-DLUCB, a communication-efficient variant, extending to safe bandit settings in distributed systems.
Findings
Achieves near-optimal regret of O(d log NT √NT)
Reduces communication cost to O(d^3 N^{2.5}) with RC-DLUCB
Extends to safe linear bandits with safety constraints
Abstract
We study decentralized stochastic linear bandits, where a network of agents acts cooperatively to efficiently solve a linear bandit-optimization problem over a -dimensional space. For this problem, we propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network. At each round of the algorithm each agent chooses its actions following an upper confidence bound (UCB) strategy and agents share information with their immediate neighbors through a carefully designed consensus procedure that repeats over cycles. Our analysis adjusts the duration of these communication cycles ensuring near-optimal regret performance at a communication rate of per round. The structure of the network affects the regret performance via a small additive term - coined the regret of delay - that depends on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAge of Information Optimization · Advanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing
