Clustered Linear Contextual Bandits with Knapsacks
Yichuan Deng, Michalis Mamakos, Zhao Song

TL;DR
This paper introduces a novel clustered linear contextual bandit algorithm that efficiently learns cluster memberships and resource models, achieving sublinear regret while respecting resource constraints, with only a single clustering step on a subset of arms.
Contribution
It proposes a new algorithm for clustered contextual bandits with knapsack constraints that requires only one clustering step, combining econometrics and bandit techniques.
Findings
Achieves sublinear regret in resource-constrained clustered bandits.
Requires clustering only once on a subset of arms.
Effectively balances reward maximization and resource constraints.
Abstract
In this work, we study clustered contextual bandits where rewards and resource consumption are the outcomes of cluster-specific linear models. The arms are divided in clusters, with the cluster memberships being unknown to an algorithm. Pulling an arm in a time period results in a reward and in consumption for each one of multiple resources, and with the total consumption of any resource exceeding a constraint implying the termination of the algorithm. Thus, maximizing the total reward requires learning not only models about the reward and the resource consumption, but also cluster memberships. We provide an algorithm that achieves regret sublinear in the number of time periods, without requiring access to all of the arms. In particular, we show that it suffices to perform clustering only once to a randomly selected subset of the arms. To achieve this result, we provide a sophisticated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Decision-Making and Behavioral Economics
