Multi-player Multi-armed Bandits with Collision-Dependent Reward Distributions
Chengshuai Shi, Cong Shen

TL;DR
This paper introduces EC3, a novel algorithm for multi-player multi-armed bandits with collision-dependent rewards, modeling implicit communication as reliable noisy channel communication, and achieves near-optimal regret in practical settings.
Contribution
It proposes the EC3 algorithm that leverages error-correcting codes for implicit communication in collision-dependent reward scenarios, improving regret bounds in no-sensing MP-MAB problems.
Findings
EC3 achieves near-optimal regret approaching the centralized lower bound.
The choice of coding schemes significantly impacts regret performance.
Experimental results validate EC3's superiority on synthetic and real datasets.
Abstract
We study a new stochastic multi-player multi-armed bandits (MP-MAB) problem, where the reward distribution changes if a collision occurs on the arm. Existing literature always assumes a zero reward for involved players if collision happens, but for applications such as cognitive radio, the more realistic scenario is that collision reduces the mean reward but not necessarily to zero. We focus on the more practical no-sensing setting where players do not perceive collisions directly, and propose the Error-Correction Collision Communication (EC3) algorithm that models implicit communication as a reliable communication over noisy channel problem, for which random coding error exponent is used to establish the optimal regret that no communication protocol can beat. Finally, optimizing the tradeoff between code length and decoding error rate leads to a regret that approaches the centralized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
