Decentralized Multi-Agent Linear Bandits with Safety Constraints

Sanae Amani; Christos Thrampoulidis

arXiv:2012.00314·cs.LG·December 2, 2020·1 cites

Decentralized Multi-Agent Linear Bandits with Safety Constraints

Sanae Amani, Christos Thrampoulidis

PDF

Open Access 1 Video

TL;DR

This paper introduces a decentralized algorithm for multi-agent linear bandits that optimizes regret while considering communication costs and safety constraints, applicable to arbitrary network topologies.

Contribution

It proposes DLUCB, a fully decentralized algorithm with near-optimal regret, and RC-DLUCB, a communication-efficient variant, extending to safe bandit settings in distributed systems.

Findings

01

Achieves near-optimal regret of O(d log NT √NT)

02

Reduces communication cost to O(d^3 N^{2.5}) with RC-DLUCB

03

Extends to safe linear bandits with safety constraints

Abstract

We study decentralized stochastic linear bandits, where a network of $N$ agents acts cooperatively to efficiently solve a linear bandit-optimization problem over a $d$ -dimensional space. For this problem, we propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network. At each round of the algorithm each agent chooses its actions following an upper confidence bound (UCB) strategy and agents share information with their immediate neighbors through a carefully designed consensus procedure that repeats over cycles. Our analysis adjusts the duration of these communication cycles ensuring near-optimal regret performance $O (d lo g N T N T)$ at a communication rate of $O (d N^{2})$ per round. The structure of the network affects the regret performance via a small additive term - coined the regret of delay - that depends on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Decentralized Multi-Agent Linear Bandits with Safety Constraints· underline

Taxonomy

TopicsAge of Information Optimization · Advanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing