Contextual Bandits with Knapsacks for a Conversion Model

Zhen Li; Gilles Stoltz (LMO; CELESTE; HEC Paris)

arXiv:2206.00314·cs.LG·October 3, 2022·1 cites

Contextual Bandits with Knapsacks for a Conversion Model

Zhen Li, Gilles Stoltz (LMO, CELESTE, HEC Paris)

PDF

Open Access 1 Video

TL;DR

This paper introduces a new approach for contextual bandits with knapsacks that models the coupling between rewards and costs through customer conversions, providing regret bounds and adaptable policies for sales with discounts.

Contribution

It proposes a novel structure linking rewards and costs via conversions and develops adaptive policies with regret bounds, extending previous linear models.

Findings

01

Achieves regret bounds of order (OPT/B) √T.

02

Develops policies based on upper-confidence estimates.

03

Extends techniques to non-linear reward-cost structures.

Abstract

We consider contextual bandits with knapsacks, with an underlying structure between rewards generated and cost vectors suffered. We do so motivated by sales with commercial discounts. At each round, given the stochastic i.i.d.\ context $x_{t}$ and the arm picked $a_{t}$ (corresponding, e.g., to a discount level), a customer conversion may be obtained, in which case a reward $r (a, x_{t})$ is gained and vector costs $c (a_{t}, x_{t})$ are suffered (corresponding, e.g., to losses of earnings). Otherwise, in the absence of a conversion, the reward and costs are null. The reward and costs achieved are thus coupled through the binary variable measuring conversion or the absence thereof. This underlying structure between rewards and costs is different from the linear structures considered by Agrawal and Devanur [2016] (but we show that the techniques introduced in the present…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Contextual Bandits with Knapsacks for a Conversion Model· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Optimization and Search Problems

MethodsOPT