Allocating Divisible Resources on Arms with Unknown and Random Rewards
Ningyuan Chen, Wenhao Li

TL;DR
This paper studies resource allocation among multiple arms with unknown, random rewards where reward variance depends on the resource amount, bridging bandit and full feedback models, and introduces algorithms with optimal regret bounds.
Contribution
It develops algorithms achieving optimal regret bounds for a unified model that interpolates between bandit and full feedback scenarios, with a novel concentration inequality.
Findings
Achieves optimal regret bounds for all b in [0,1]
Identifies a phase transition at b=1/2
Develops a new concentration inequality for fractional weights
Abstract
We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order of the allocated resource. In particular, if the decision maker allocates resource to arm in a period, then the reward is, where is the unknown mean and the noise is independent and sub-Gaussian. When the order ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for , and demonstrate a phase transition at . The theoretical results hinge on a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Data Stream Mining Techniques
