Contextual Linear Bandits with Delay as Payoff

Mengxiao Zhang; Yingfei Wang; Haipeng Luo

arXiv:2502.12528·cs.LG·February 21, 2025

Contextual Linear Bandits with Delay as Payoff

Mengxiao Zhang, Yingfei Wang, Haipeng Luo

PDF

Open Access 1 Video

TL;DR

This paper extends the delay-as-payoff model to contextual linear bandits, proposing an efficient phased elimination algorithm with regret bounds that handle delays proportional to payoffs, and demonstrates its effectiveness through experiments.

Contribution

It introduces a novel phased arm elimination algorithm for contextual linear bandits with delay-as-payoff, achieving near-optimal regret bounds and extending to varying action sets.

Findings

01

Regret overhead is at most DΔ_max log T compared to no-delay case.

02

Further improvements are shown for the loss setting, indicating a separation from reward.

03

Experimental results demonstrate the algorithm's effectiveness and superior performance.

Abstract

A recent work by Schlisselberg et al. (2024) studies a delay-as-payoff model for stochastic multi-armed bandits, where the payoff (either loss or reward) is delayed for a period that is proportional to the payoff itself. While this captures many real-world applications, the simple multi-armed bandit setting limits the practicality of their results. In this paper, we address this limitation by studying the delay-as-payoff model for contextual linear bandits. Specifically, we start from the case with a fixed action set and propose an efficient algorithm whose regret overhead compared to the standard no-delay case is at most $D Δ_{m a x} lo g T$ , where $T$ is the total horizon, $D$ is the maximum delay, and $Δ_{m a x}$ is the maximum suboptimality gap. When payoff is loss, we also show further improvement of the bound, demonstrating a separation between reward and loss similar to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Contextual Linear Bandits with Delay as Payoff· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Decision-Making and Behavioral Economics · Auction Theory and Applications

MethodsSparse Evolutionary Training