Non-Stationary Restless Multi-Armed Bandits with Provable Guarantee
Yu-Heng Hung, Ping-Chun Hsieh, Kai Wang

TL;DR
This paper introduces a new algorithm for non-stationary restless multi-armed bandits that effectively learns and adapts to changing dynamics, providing the first theoretical regret guarantees in this challenging setting.
Contribution
It proposes mab, a novel algorithm combining sliding window RL and UCB for non-stationary RMABs with bounded variation, and establishes the first regret bound for such problems.
Findings
Achieves a regret bound of ilde{O}(N^2 B^{1/4} T^{3/4})
Integrates sliding window RL with UCB for dynamic environment adaptation
Provides foundational theoretical analysis for non-stationary RMABs
Abstract
Online restless multi-armed bandits (RMABs) typically assume that each arm follows a stationary Markov Decision Process (MDP) with fixed state transitions and rewards. However, in real-world applications like healthcare and recommendation systems, these assumptions often break due to non-stationary dynamics, posing significant challenges for traditional RMAB algorithms. In this work, we specifically consider -armd RMAB with non-stationary transition constrained by bounded variation budgets . Our proposed \rmab\; algorithm integrates sliding window reinforcement learning (RL) with an upper confidence bound (UCB) mechanism to simultaneously learn transition dynamics and their variations. We further establish that \rmab\; achieves regret bound by leveraging a relaxed definition of regret, providing a foundational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
