Multi-Armed Bandits with Censored Consumption of Resources
Viktor Bengs, Eyke H\"ullermeier

TL;DR
This paper introduces a resource-aware multi-armed bandit model with censored rewards, proposing a UCB-inspired algorithm that balances reward maximization and resource minimization, supported by theoretical analysis and simulations.
Contribution
It formulates a novel bandit problem with resource constraints and censored feedback, and develops a new algorithm with proven regret bounds.
Findings
The proposed algorithm outperforms standard bandit algorithms in simulations.
Theoretical regret bounds are established for the new algorithm.
Resource-aware exploration improves reward realization under resource limits.
Abstract
We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit. Otherwise, the observation is censored, i.e., no reward is obtained. For this problem setting, we introduce a measure of regret, which incorporates the actual amount of allocated resources of each learning round as well as the optimality of realizable rewards. Thus, to minimize regret, the learner needs to set a resource limit and choose an arm in such a way that the chance to realize a high reward within the predefined resource limit is high, while the resource limit itself should be kept as low as possible. We propose a UCB-inspired online learning algorithm, which we analyze theoretically in terms of its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
