Allocating Divisible Resources on Arms with Unknown and Random Rewards

Ningyuan Chen; Wenhao Li

arXiv:2306.16578·cs.LG·November 6, 2023

Allocating Divisible Resources on Arms with Unknown and Random Rewards

Ningyuan Chen, Wenhao Li

PDF

Open Access

TL;DR

This paper studies resource allocation among multiple arms with unknown, random rewards where reward variance depends on the resource amount, bridging bandit and full feedback models, and introduces algorithms with optimal regret bounds.

Contribution

It develops algorithms achieving optimal regret bounds for a unified model that interpolates between bandit and full feedback scenarios, with a novel concentration inequality.

Findings

01

Achieves optimal regret bounds for all b in [0,1]

02

Identifies a phase transition at b=1/2

03

Develops a new concentration inequality for fractional weights

Abstract

We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource. In particular, if the decision maker allocates resource $A_{i}$ to arm $i$ in a period, then the reward $Y_{i}$ is $Y_{i} (A_{i}) = A_{i} μ_{i} + A_{i}^{b} ξ_{i}$ , where $μ_{i}$ is the unknown mean and the noise $ξ_{i}$ is independent and sub-Gaussian. When the order $b$ ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for $b \in [0, 1]$ , and demonstrate a phase transition at $b = 1/2$ . The theoretical results hinge on a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Data Stream Mining Techniques