Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms

Xuchuang Wang; Hong Xie; John C.S. Lui

arXiv:2206.08776·cs.LG·June 20, 2022·1 cites

Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms

Xuchuang Wang, Hong Xie, John C.S. Lui

PDF

Open Access

TL;DR

This paper introduces a novel multi-armed bandit problem with shareable, finite-capacity arms, providing theoretical bounds and an algorithm that effectively learns load-dependent rewards, validated by experiments in network selection.

Contribution

It generalizes MP-MAB to shareable, finite-capacity arms with unknown reward distributions, offering lower bounds, a capacity estimator, and an online algorithm with proven regret bounds.

Findings

01

Proved lower bounds for capacity learning and regret in shareable arm setting.

02

Designed an algorithm with regret bounds matching theoretical lower bounds.

03

Validated the approach with experiments in 5G/4G base station selection.

Abstract

We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a shareable arm setting, in which several plays can share the same arm. Furthermore, each shareable arm has a finite reward capacity and a ''per-load'' reward distribution, both of which are unknown to the learner. The reward from a shareable arm is load-dependent, which is the "per-load" reward multiplying either the number of plays pulling the arm, or its reward capacity when the number of plays exceeds the capacity limit. When the "per-load" reward follows a Gaussian distribution, we prove a sample complexity lower bound of learning the capacity from load-dependent rewards and also a regret lower bound of this new MP-MAB problem. We devise a capacity estimator whose sample complexity upper bound matches the lower bound in terms of reward means and capacities. We also propose an online learning algorithm to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Cognitive Radio Networks and Spectrum Sensing

MethodsBalanced Selection