Towards Optimal Algorithms for Multi-Player Bandits without Collision   Sensing Information

Wei Huang; Richard Combes; Cindy Trinh

arXiv:2103.13059·stat.ML·June 7, 2022·5 cites

Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

Wei Huang, Richard Combes, Cindy Trinh

PDF

Open Access

TL;DR

This paper introduces a new algorithm for multi-player multi-armed bandits that operates without collision sensing and does not require prior knowledge of minimal expected rewards, achieving better theoretical and practical performance.

Contribution

The proposed algorithm removes the need for a lower bound on minimal expected rewards and improves scalability, with proven regret bounds and superior empirical results.

Findings

01

Outperforms existing algorithms in simulations

02

Does not require prior minimal reward bounds

03

Achieves lower regret bounds

Abstract

We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper bound to justify these claims. We complement our theoretical results with numerical experiments, showing that the proposed algorithm outperforms state-of-the-art in practice as well.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management