An efficient algorithm for contextual bandits with knapsacks, and an   extension to concave objectives

Shipra Agrawal; Nikhil R. Devanur; Lihong Li

arXiv:1506.03374·cs.LG·July 12, 2016·25 cites

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives

Shipra Agrawal, Nikhil R. Devanur, Lihong Li

PDF

Open Access

TL;DR

This paper introduces a computationally efficient algorithm for contextual bandits with knapsack constraints, achieving improved regret bounds and extending to concave objectives, addressing a key open problem in the field.

Contribution

It provides the first efficient algorithm with better regret bounds for contextual bandits with knapsack constraints, scaling logarithmically with policy space size, and extends to concave objectives.

Findings

01

Algorithm achieves near-optimal regret bounds.

02

Computational complexity scales logarithmically with policy space.

03

Extends to Lipschitz concave objective functions.

Abstract

We consider a contextual version of multi-armed bandit problem with global knapsack constraints. In each round, the outcome of pulling an arm is a scalar reward and a resource consumption vector, both dependent on the context, and the global knapsack constraints require the total consumption for each resource to be below some pre-fixed budget. The learning agent competes with an arbitrary set of context-dependent policies. This problem was introduced by Badanidiyuru et al. (2014), who gave a computationally inefficient algorithm with near-optimal regret bounds for it. We give a computationally efficient algorithm for this problem with slightly better regret bounds, by generalizing the approach of Agarwal et al. (2014) for the non-constrained version of the problem. The computational time of our algorithm scales logarithmically in the size of the policy space. This answers the main open…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems