Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing
Hong Xie, Haoran Gu, Yanying Huang, Tao Tan, Defu Lian

TL;DR
This paper introduces a novel multi-play stochastic bandit model with prioritized resource sharing, providing theoretical regret bounds and an efficient algorithm for optimal play allocation in complex resource management scenarios.
Contribution
It develops a new bandit model with prioritized capacity sharing, derives regret bounds, and proposes algorithms for optimal resource allocation under this model.
Findings
Proved regret lower bounds for the model.
Designed an algorithm with regret bounds matching lower bounds.
Addressed complex nonlinear utility optimization challenges.
Abstract
This paper proposes a variant of multiple-play stochastic bandits tailored to resource allocation problems arising from LLM applications, edge intelligence, etc. The model is composed of arms and plays. Each arm has a stochastic number of capacities, and each unit of capacity is associated with a reward function. Each play is associated with a priority weight. When multiple plays compete for the arm capacity, the arm capacity is allocated in a larger priority weight first manner. Instance independent and instance dependent regret lower bounds of and are proved, where is the largest priority weight and characterizes the reward tail. When model parameters are given, we design an algorithm named \texttt{MSB-PRS-OffOpt} to locate the optimal play allocation policy with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques
