Linear Contextual Bandits with Knapsacks
Shipra Agrawal, Nikhil R. Devanur

TL;DR
This paper introduces algorithms for the linear contextual bandit problem with resource constraints, achieving near-optimal regret bounds and unifying several related problems in online learning.
Contribution
It presents new algorithms with near-optimal regret bounds for a generalized bandit problem combining linear contexts, resource constraints, and multiple objectives.
Findings
Algorithms achieve near-optimal regret bounds.
The approach unifies and extends linear contextual bandits, bandits with knapsacks, and online stochastic packing.
Results outperform unstructured models with arbitrary context-outcome relations.
Abstract
We consider the linear contextual bandit problem with resource consumption, in addition to reward generation. In each round, the outcome of pulling an arm is a reward as well as a vector of resource consumptions. The expected values of these outcomes depend linearly on the context of that arm. The budget/capacity constraints require that the total consumption doesn't exceed the budget for each resource. The objective is once again to maximize the total reward. This problem turns out to be a common generalization of classic linear contextual bandits (linContextual), bandits with knapsacks (BwK), and the online stochastic packing problem (OSPP). We present algorithms with near-optimal regret bounds for this problem. Our bounds compare favorably to results on the unstructured version of the problem where the relation between the contexts and the outcomes could be arbitrary, but the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Smart Grid Energy Management
