Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing

Hong Xie; Haoran Gu; Yanying Huang; Tao Tan; Defu Lian

arXiv:2512.21626·cs.AI·December 29, 2025

Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing

Hong Xie, Haoran Gu, Yanying Huang, Tao Tan, Defu Lian

PDF

Open Access

TL;DR

This paper introduces a novel multi-play stochastic bandit model with prioritized resource sharing, providing theoretical regret bounds and an efficient algorithm for optimal play allocation in complex resource management scenarios.

Contribution

It develops a new bandit model with prioritized capacity sharing, derives regret bounds, and proposes algorithms for optimal resource allocation under this model.

Findings

01

Proved regret lower bounds for the model.

02

Designed an algorithm with regret bounds matching lower bounds.

03

Addressed complex nonlinear utility optimization challenges.

Abstract

This paper proposes a variant of multiple-play stochastic bandits tailored to resource allocation problems arising from LLM applications, edge intelligence, etc. The model is composed of $M$ arms and $K$ plays. Each arm has a stochastic number of capacities, and each unit of capacity is associated with a reward function. Each play is associated with a priority weight. When multiple plays compete for the arm capacity, the arm capacity is allocated in a larger priority weight first manner. Instance independent and instance dependent regret lower bounds of $Ω (α_{1} σ K M T)$ and $Ω (α_{1} σ^{2} \frac{M}{Δ} ln T)$ are proved, where $α_{1}$ is the largest priority weight and $σ$ characterizes the reward tail. When model parameters are given, we design an algorithm named \texttt{MSB-PRS-OffOpt} to locate the optimal play allocation policy with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques