Delay as Payoff in MAB

Ofir Schlisselberg; Ido Cohen; Tal Lancewicki; Yishay Mansour

arXiv:2408.15158·cs.LG·October 16, 2024

Delay as Payoff in MAB

Ofir Schlisselberg, Ido Cohen, Tal Lancewicki, Yishay Mansour

PDF

Open Access 1 Video

TL;DR

This paper studies a variant of the multi-armed bandit problem where delays directly affect payoffs, providing tight bounds for both cost and reward scenarios and demonstrating their differences through theoretical analysis and experiments.

Contribution

It introduces the first analysis of delays as costs in MAB, deriving optimal regret bounds that improve upon previous delay-dependent models, and compares cost versus reward delay impacts.

Findings

01

Optimal regret bounds for delay as costs: $rac{ ext{log} T}{ ext{gap}} + d^*$

02

Optimal regret bounds for delay as rewards: $rac{ ext{log} T}{ ext{gap}} + ar{d}$

03

Empirical results confirm theoretical improvements

Abstract

In this paper, we investigate a variant of the classical stochastic Multi-armed Bandit (MAB) problem, where the payoff received by an agent (either cost or reward) is both delayed, and directly corresponds to the magnitude of the delay. This setting models faithfully many real world scenarios such as the time it takes for a data packet to traverse a network given a choice of route (where delay serves as the agent's cost); or a user's time spent on a web page given a choice of content (where delay serves as the agent's reward). Our main contributions are tight upper and lower bounds for both the cost and reward settings. For the case that delays serve as costs, which we are the first to consider, we prove optimal regret that scales as $\sum_{i : Δ_{i} > 0} \frac{l o g T}{Δ _{i}} + d^{*}$ , where $T$ is the maximal number of steps, $Δ_{i}$ are the sub-optimality gaps and $d^{*}$ is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Delay as Payoff in MAB· underline

Taxonomy

TopicsICT Impact and Policies · Auction Theory and Applications