Linear Contextual Bandits with Knapsacks

Shipra Agrawal; Nikhil R. Devanur

arXiv:1507.06738·cs.LG·July 12, 2016·35 cites

Linear Contextual Bandits with Knapsacks

Shipra Agrawal, Nikhil R. Devanur

PDF

Open Access

TL;DR

This paper introduces algorithms for the linear contextual bandit problem with resource constraints, achieving near-optimal regret bounds and unifying several related problems in online learning.

Contribution

It presents new algorithms with near-optimal regret bounds for a generalized bandit problem combining linear contexts, resource constraints, and multiple objectives.

Findings

01

Algorithms achieve near-optimal regret bounds.

02

The approach unifies and extends linear contextual bandits, bandits with knapsacks, and online stochastic packing.

03

Results outperform unstructured models with arbitrary context-outcome relations.

Abstract

We consider the linear contextual bandit problem with resource consumption, in addition to reward generation. In each round, the outcome of pulling an arm is a reward as well as a vector of resource consumptions. The expected values of these outcomes depend linearly on the context of that arm. The budget/capacity constraints require that the total consumption doesn't exceed the budget for each resource. The objective is once again to maximize the total reward. This problem turns out to be a common generalization of classic linear contextual bandits (linContextual), bandits with knapsacks (BwK), and the online stochastic packing problem (OSPP). We present algorithms with near-optimal regret bounds for this problem. Our bounds compare favorably to results on the unstructured version of the problem where the relation between the contexts and the outcomes could be arbitrary, but the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Smart Grid Energy Management