Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards
Mengfan Xu, Diego Klabjan

TL;DR
This paper introduces a decentralized multi-agent multi-armed bandit algorithm that effectively manages heterogeneous rewards and random communication graphs, achieving near-optimal regret bounds with high probability.
Contribution
It proposes a novel framework combining robust graph simulation, consensus averaging, and UCB techniques, removing the need for doubly stochastic graphs and handling diverse reward distributions.
Findings
Achieves optimal $ ext{log} T$ regret bounds in sub-Gaussian and sub-exponential environments.
Provides high-probability regret bounds that account for graph randomness.
Handles heterogeneous rewards across clients without requiring prior knowledge of the reward distributions.
Abstract
We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-gaussian distributions. Each client pulls an arm and communicates with neighbors based on the graph provided by the environment. The goal is to minimize the overall regret of the entire system through collaborations. To this end, we introduce a novel algorithmic framework, which first provides robust simulation methods for generating random graphs using rapidly mixing Markov chains or the random graph model, and then combines an averaging-based consensus approach with a newly proposed weighting technique and the upper confidence bound to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Cognitive Radio Networks and Spectrum Sensing
