Budget-Constrained Bandits over General Cost and Reward Distributions
Semih Cayci, Atilla Eryilmaz, R. Srikant

TL;DR
This paper studies a complex bandit problem with random, potentially correlated costs and rewards under budget constraints, proposing algorithms that achieve near-optimal regret bounds in general and Gaussian cases.
Contribution
It introduces algorithms that exploit cost-reward correlation and provides tight regret bounds, including a lower bound, for budget-constrained bandits with general distributions.
Findings
Achieves $O(\log B)$ regret under certain moment conditions.
Proposes algorithms using linear MMSE estimation to exploit correlation.
Establishes tight regret bounds, optimal up to constants, for Gaussian cases.
Abstract
We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavy-tailed cost-reward pairs that can take on negative values as required by many applications. We show that if moments of order for some exist for all cost-reward pairs, regret is achievable for a budget . In order to achieve tight regret bounds, we propose algorithms that exploit the correlation between the cost and reward of each arm by extracting the common information via linear minimum mean-square error estimation. We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics
