Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information
Wei Huang, Richard Combes, Cindy Trinh

TL;DR
This paper introduces a new algorithm for multi-player multi-armed bandits that operates without collision sensing and does not require prior knowledge of minimal expected rewards, achieving better theoretical and practical performance.
Contribution
The proposed algorithm removes the need for a lower bound on minimal expected rewards and improves scalability, with proven regret bounds and superior empirical results.
Findings
Outperforms existing algorithms in simulations
Does not require prior minimal reward bounds
Achieves lower regret bounds
Abstract
We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper bound to justify these claims. We complement our theoretical results with numerical experiments, showing that the proposed algorithm outperforms state-of-the-art in practice as well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
