Continuous-time multi-armed bandits under random intervention times
Kei Noba, Jos\'e Luis P\'erez, Kazutoshi Yamazaki, Qingyuan Zhang

TL;DR
This paper studies multi-armed bandit problems with actions occurring at random times, providing explicit Gittins index characterizations for various stochastic processes, supported by numerical experiments.
Contribution
It offers explicit Gittins index formulas for arms modeled by Lévy processes under random intervention times, extending classical bandit theory.
Findings
Explicit Gittins index for Lévy process arms.
Closed-form Gittins index for exponential inter-arrival times.
Numerical validation of theoretical index formulas.
Abstract
This paper examines multi-armed bandits in which actions are taken at random discrete times. The model consists of independent arms. When an arm is operated, it must remain active for a random duration, modeled by the inter-arrival time of a (possibly arm-dependent) renewal process. For arms evolving as a L\'evy process, we provide an explicit characterization of the Gittins index, which is known to yield an optimal strategy. Furthermore, when the inter-arrival times are exponential and the arms evolve as either a spectrally negative L\'evy process, a reflected spectrally negative L\'evy process, or a diffusion process, the Gittins index is explicitly characterized in terms of the scale function or diffusion characteristics, respectively. Numerical experiments are performed to support the theoretical results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Reinforcement Learning in Robotics
