Bandits with concave rewards and convex knapsacks
Shipra Agrawal, Nikhil R. Devanur

TL;DR
This paper introduces a highly general bandit model with concave rewards and convex constraints, extending classic models and providing algorithms with near-optimal regret guarantees, applicable to diverse complex decision-making scenarios.
Contribution
It extends the bandit framework to include concave rewards and convex constraints, offering simple UCB-based algorithms with provable regret bounds for this broad setting.
Findings
Near-optimal regret guarantees achieved by the proposed algorithms
Connections established with Blackwell approachability, online convex optimization, and Frank-Wolfe methods
Applications demonstrate richer problem formulations with this general model
Abstract
In this paper, we consider a very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon. This model subsumes the classic multi-armed bandit (MAB) model, and the Bandits with Knapsacks (BwK) model of Badanidiyuru et al.[2013]. We also consider an extension of this model to allow linear contexts, similar to the linear contextual extension of the MAB model. We demonstrate that a natural and simple extension of the UCB family of algorithms for MAB provides a polynomial time algorithm that has near-optimal regret guarantees for this substantially more general model, and matches the bounds provided by Badanidiyuru et al.[2013] for the special case of BwK, which is quite surprising. We also provide computationally more efficient algorithms by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics
