Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities

Hong Xie; Jinyu Mo; Defu Lian; Jie Wang; Enhong Chen

arXiv:2408.10865·cs.AI·August 21, 2024

Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities

Hong Xie, Jinyu Mo, Defu Lian, Jie Wang, Enhong Chen

PDF

Open Access

TL;DR

This paper introduces a novel distributed multi-agent multi-armed bandit model with stochastic request arrivals and proposes algorithms for players to learn and agree on optimal arm allocation without communication, validated through experiments.

Contribution

It develops new distributed algorithms for multi-agent bandits with stochastic arm capacities, including polynomial-time locating and consensus algorithms, and applies explore-then-commit strategies.

Findings

01

Algorithms locate optimal arm profiles efficiently

02

Players reach consensus in a constant number of rounds

03

Experimental validation confirms effectiveness

Abstract

Motivated by distributed selection problems, we formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures stochastic arrival of requests to each arm, as well as the policy of allocating requests to players. The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile (an arm pulling profile prescribes the number of players at each arm) without communicating to each other. We first design a greedy algorithm, which locates one of the optimal arm pulling profiles with a polynomial computational complexity. We also design an iterative distributed algorithm for players to commit to an optimal arm pulling profile with a constant number of rounds in expectation. We apply the explore then commit (ETC) framework to address the online setting when model parameters are unknown. We design an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems