Non-stationary Bandits with Knapsacks
Shang Liu, Jiashuo Jiang, Xiaocheng Li

TL;DR
This paper investigates non-stationary bandits with knapsacks, introducing a new non-stationarity measure and deriving bounds that account for resource constraints and environment changes.
Contribution
It proposes a novel global non-stationarity measure for BwK, extending analysis to non-stationary environments and online convex optimization with constraints.
Findings
Established upper and lower bounds for non-stationary BwK.
Introduced a new non-stationarity measure suitable for constrained settings.
Extended analysis to online convex optimization with constraints.
Abstract
In this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the decision maker/player chooses to play an arm, and s/he will receive a reward and consume certain amount of resource from each of the multiple resource types. The objective is to maximize the cumulative reward over a finite horizon subject to some knapsack constraints on the resources. Existing works study the BwK problem under either a stochastic or adversarial environment. Our paper considers a non-stationary environment which continuously interpolates between these two extremes. We first show that the traditional notion of variation budget is insufficient to characterize the non-stationarity of the BwK problem for a sublinear regret due to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications
