Combinatorial Multi-armed Bandits for Resource Allocation
Jinhang Zuo, Carlee Joe-Wong

TL;DR
This paper addresses sequential resource allocation using combinatorial multi-armed bandit algorithms, aiming to maximize rewards or minimize regret in scenarios like wireless spectrum sharing and computing time distribution.
Contribution
It introduces new algorithms for combinatorial multi-armed bandits with discrete and continuous budgets, achieving logarithmic regret bounds under semi-bandit feedback.
Findings
Algorithms achieve logarithmic regret bounds.
Effective resource allocation strategies for unknown reward processes.
Applicable to wireless spectrum and computing resource management.
Abstract
We study the sequential resource allocation problem where a decision maker repeatedly allocates budgets between resources. Motivating examples include allocating limited computing time or wireless spectrum bands to multiple users (i.e., resources). At each timestep, the decision maker should distribute its available budgets among different resources to maximize the expected reward, or equivalently to minimize the cumulative regret. In doing so, the decision maker should learn the value of the resources allocated for each user from feedback on each user's received reward. For example, users may send messages of different urgency over wireless spectrum bands; the reward generated by allocating spectrum to a user then depends on the message's urgency. We assume each user's reward follows a random process that is initially unknown. We design combinatorial multi-armed bandit algorithms to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Advanced Wireless Network Optimization · Optimization and Search Problems
