Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms
Xuchuang Wang, Hong Xie, John C.S. Lui

TL;DR
This paper introduces a novel multi-armed bandit problem with shareable, finite-capacity arms, providing theoretical bounds and an algorithm that effectively learns load-dependent rewards, validated by experiments in network selection.
Contribution
It generalizes MP-MAB to shareable, finite-capacity arms with unknown reward distributions, offering lower bounds, a capacity estimator, and an online algorithm with proven regret bounds.
Findings
Proved lower bounds for capacity learning and regret in shareable arm setting.
Designed an algorithm with regret bounds matching theoretical lower bounds.
Validated the approach with experiments in 5G/4G base station selection.
Abstract
We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a shareable arm setting, in which several plays can share the same arm. Furthermore, each shareable arm has a finite reward capacity and a ''per-load'' reward distribution, both of which are unknown to the learner. The reward from a shareable arm is load-dependent, which is the "per-load" reward multiplying either the number of plays pulling the arm, or its reward capacity when the number of plays exceeds the capacity limit. When the "per-load" reward follows a Gaussian distribution, we prove a sample complexity lower bound of learning the capacity from load-dependent rewards and also a regret lower bound of this new MP-MAB problem. We devise a capacity estimator whose sample complexity upper bound matches the lower bound in terms of reward means and capacities. We also propose an online learning algorithm to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Cognitive Radio Networks and Spectrum Sensing
MethodsBalanced Selection
