Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities
Hong Xie, Jinyu Mo, Defu Lian, Jie Wang, Enhong Chen

TL;DR
This paper introduces a novel distributed multi-agent multi-armed bandit model with stochastic request arrivals and proposes algorithms for players to learn and agree on optimal arm allocation without communication, validated through experiments.
Contribution
It develops new distributed algorithms for multi-agent bandits with stochastic arm capacities, including polynomial-time locating and consensus algorithms, and applies explore-then-commit strategies.
Findings
Algorithms locate optimal arm profiles efficiently
Players reach consensus in a constant number of rounds
Experimental validation confirms effectiveness
Abstract
Motivated by distributed selection problems, we formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures stochastic arrival of requests to each arm, as well as the policy of allocating requests to players. The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile (an arm pulling profile prescribes the number of players at each arm) without communicating to each other. We first design a greedy algorithm, which locates one of the optimal arm pulling profiles with a polynomial computational complexity. We also design an iterative distributed algorithm for players to commit to an optimal arm pulling profile with a constant number of rounds in expectation. We apply the explore then commit (ETC) framework to address the online setting when model parameters are unknown. We design an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
