Delay as Payoff in MAB
Ofir Schlisselberg, Ido Cohen, Tal Lancewicki, Yishay Mansour

TL;DR
This paper studies a variant of the multi-armed bandit problem where delays directly affect payoffs, providing tight bounds for both cost and reward scenarios and demonstrating their differences through theoretical analysis and experiments.
Contribution
It introduces the first analysis of delays as costs in MAB, deriving optimal regret bounds that improve upon previous delay-dependent models, and compares cost versus reward delay impacts.
Findings
Optimal regret bounds for delay as costs: $rac{ ext{log} T}{ ext{gap}} + d^*$
Optimal regret bounds for delay as rewards: $rac{ ext{log} T}{ ext{gap}} + ar{d}$
Empirical results confirm theoretical improvements
Abstract
In this paper, we investigate a variant of the classical stochastic Multi-armed Bandit (MAB) problem, where the payoff received by an agent (either cost or reward) is both delayed, and directly corresponds to the magnitude of the delay. This setting models faithfully many real world scenarios such as the time it takes for a data packet to traverse a network given a choice of route (where delay serves as the agent's cost); or a user's time spent on a web page given a choice of content (where delay serves as the agent's reward). Our main contributions are tight upper and lower bounds for both the cost and reward settings. For the case that delays serve as costs, which we are the first to consider, we prove optimal regret that scales as , where is the maximal number of steps, are the sub-optimality gaps and is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsICT Impact and Policies · Auction Theory and Applications
