Decentralized Heterogeneous Multi-Player Multi-Armed Bandits with   Non-Zero Rewards on Collisions

Akshayaa Magesh; Venugopal V. Veeravalli

arXiv:1910.09089·cs.LG·December 30, 2021

Decentralized Heterogeneous Multi-Player Multi-Armed Bandits with Non-Zero Rewards on Collisions

Akshayaa Magesh, Venugopal V. Veeravalli

PDF

Open Access

TL;DR

This paper introduces a decentralized algorithm for multi-player multi-armed bandits with heterogeneous rewards and non-zero collision rewards, achieving near-optimal regret without prior knowledge of the time horizon.

Contribution

It proposes a novel policy for decentralized multi-player bandits with heterogeneity and non-zero collision rewards, achieving near order-optimal regret.

Findings

01

Achieves expected regret of order O(log^{1+δ} T)

02

Handles more players than arms without communication

03

Supports non-zero rewards on collisions

Abstract

We consider a fully decentralized multi-player stochastic multi-armed bandit setting where the players cannot communicate with each other and can observe only their own actions and rewards. The environment may appear differently to different players, $i.e.$ , the reward distributions for a given arm are heterogeneous across players. In the case of a collision (when more than one player plays the same arm), we allow for the colliding players to receive non-zero rewards. The time-horizon $T$ for which the arms are played is \emph{not} known to the players. Within this setup, where the number of players is allowed to be greater than the number of arms, we present a policy that achieves near order-optimal expected regret of order $O (lo g^{1 + δ} T)$ for some $0 < δ < 1$ over a time-horizon of duration $T$ . This paper is accepted at IEEE Transactions on Information Theory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Smart Grid Energy Management