Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with   Heterogeneous Rewards

Mengfan Xu; Diego Klabjan

arXiv:2306.05579·cs.LG·October 19, 2023·2 cites

Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

Mengfan Xu, Diego Klabjan

PDF

Open Access 1 Video

TL;DR

This paper introduces a decentralized multi-agent multi-armed bandit algorithm that effectively manages heterogeneous rewards and random communication graphs, achieving near-optimal regret bounds with high probability.

Contribution

It proposes a novel framework combining robust graph simulation, consensus averaging, and UCB techniques, removing the need for doubly stochastic graphs and handling diverse reward distributions.

Findings

01

Achieves optimal $ ext{log} T$ regret bounds in sub-Gaussian and sub-exponential environments.

02

Provides high-probability regret bounds that account for graph randomness.

03

Handles heterogeneous rewards across clients without requiring prior knowledge of the reward distributions.

Abstract

We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-gaussian distributions. Each client pulls an arm and communicates with neighbors based on the graph provided by the environment. The goal is to minimize the overall regret of the entire system through collaborations. To this end, we introduce a novel algorithmic framework, which first provides robust simulation methods for generating random graphs using rapidly mixing Markov chains or the random graph model, and then combines an averaging-based consensus approach with a newly proposed weighting technique and the upper confidence bound to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Cognitive Radio Networks and Spectrum Sensing