Episodic Contextual Bandits with Knapsacks under Conversion Models
Wang Chi Cheung, Zitian Li

TL;DR
This paper introduces an online algorithm for episodic contextual bandits with knapsacks under a shared latent conversion model, achieving sub-linear regret and handling non-stationary contexts and large state spaces.
Contribution
It develops a novel framework for episodic BwK with non-stationary contexts and unbounded state spaces, providing improved regret bounds with unlabeled feature data.
Findings
Achieves sub-linear regret in the number of episodes.
Handles arbitrarily many contexts with an unbounded state space.
Provides improved regret bounds using unlabeled feature data.
Abstract
We study an online setting, where a decision maker (DM) interacts with contextual bandit-with-knapsack (BwK) instances in repeated episodes. These episodes start with different resource amounts, and the contexts' probability distributions are non-stationary in an episode. All episodes share the same latent conversion model, which governs the random outcome contingent upon a request's context and an allocation decision. Our model captures applications such as dynamic pricing on perishable resources with episodic replenishment, and first price auctions in repeated episodes with different starting budgets. We design an online algorithm that achieves a regret sub-linear in , the number of episodes, assuming access to a \emph{confidence bound oracle} that achieves an -regret. Such an oracle is readily available from existing contextual bandit literature. We overcome the technical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Consumer Market Behavior and Pricing · Auction Theory and Applications
