Bandits with concave rewards and convex knapsacks

Shipra Agrawal; Nikhil R. Devanur

arXiv:1402.5758·cs.LG·February 25, 2014·24 cites

Bandits with concave rewards and convex knapsacks

Shipra Agrawal, Nikhil R. Devanur

PDF

Open Access

TL;DR

This paper introduces a highly general bandit model with concave rewards and convex constraints, extending classic models and providing algorithms with near-optimal regret guarantees, applicable to diverse complex decision-making scenarios.

Contribution

It extends the bandit framework to include concave rewards and convex constraints, offering simple UCB-based algorithms with provable regret bounds for this broad setting.

Findings

01

Near-optimal regret guarantees achieved by the proposed algorithms

02

Connections established with Blackwell approachability, online convex optimization, and Frank-Wolfe methods

03

Applications demonstrate richer problem formulations with this general model

Abstract

In this paper, we consider a very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon. This model subsumes the classic multi-armed bandit (MAB) model, and the Bandits with Knapsacks (BwK) model of Badanidiyuru et al.[2013]. We also consider an extension of this model to allow linear contexts, similar to the linear contextual extension of the MAB model. We demonstrate that a natural and simple extension of the UCB family of algorithms for MAB provides a polynomial time algorithm that has near-optimal regret guarantees for this substantially more general model, and matches the bounds provided by Badanidiyuru et al.[2013] for the special case of BwK, which is quite surprising. We also provide computationally more efficient algorithms by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics