Non-Stationary Restless Multi-Armed Bandits with Provable Guarantee

Yu-Heng Hung; Ping-Chun Hsieh; Kai Wang

arXiv:2508.10804·cs.LG·August 15, 2025

Non-Stationary Restless Multi-Armed Bandits with Provable Guarantee

Yu-Heng Hung, Ping-Chun Hsieh, Kai Wang

PDF

TL;DR

This paper introduces a new algorithm for non-stationary restless multi-armed bandits that effectively learns and adapts to changing dynamics, providing the first theoretical regret guarantees in this challenging setting.

Contribution

It proposes mab, a novel algorithm combining sliding window RL and UCB for non-stationary RMABs with bounded variation, and establishes the first regret bound for such problems.

Findings

01

Achieves a regret bound of ilde{O}(N^2 B^{1/4} T^{3/4})

02

Integrates sliding window RL with UCB for dynamic environment adaptation

03

Provides foundational theoretical analysis for non-stationary RMABs

Abstract

Online restless multi-armed bandits (RMABs) typically assume that each arm follows a stationary Markov Decision Process (MDP) with fixed state transitions and rewards. However, in real-world applications like healthcare and recommendation systems, these assumptions often break due to non-stationary dynamics, posing significant challenges for traditional RMAB algorithms. In this work, we specifically consider $N$ -armd RMAB with non-stationary transition constrained by bounded variation budgets $B$ . Our proposed \rmab\; algorithm integrates sliding window reinforcement learning (RL) with an upper confidence bound (UCB) mechanism to simultaneously learn transition dynamics and their variations. We further establish that \rmab\; achieves $O (N^{2} B^{\frac{1}{4}} T^{\frac{3}{4}})$ regret bound by leveraging a relaxed definition of regret, providing a foundational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.