Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits
Guojun Xiong, Jian Li

TL;DR
This paper introduces a decentralized model for multi-player multi-armed bandits where players have limited, dynamic access to arms, and proposes an algorithm that effectively manages exploration, exploitation, and collisions with near-optimal regret guarantees.
Contribution
It formulates a novel multi-player multi-armed walking bandits model addressing limited and dynamic arm access, and develops a decentralized algorithm with strong theoretical and empirical performance.
Findings
The proposed algorithm achieves near-optimal regret bounds.
It effectively manages collisions in a decentralized setting.
Empirical results show competitive performance.
Abstract
Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems. Most research for this problem focuses exclusively on the settings that players have \textit{full access} to all arms and receive no reward when pulling the same arm. Hence all players solve the same bandit problem with the goal of maximizing their cumulative reward. However, these settings neglect several important factors in many real-world applications, where players have \textit{limited access} to \textit{a dynamic local subset of arms} (i.e., an arm could sometimes be ``walking'' and not accessible to the player). To this end, this paper proposes a \textit{multi-player multi-armed walking bandits} model, aiming to address aforementioned modeling issues. The goal now is to maximize the reward, however, players can only pull arms from the local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Smart Grid Energy Management
