A Modularized Framework for Piecewise-Stationary Restless Bandits
Kuan-Ta Li, Chia-Chun Lin, Ping-Chun Hsieh, Yu-Chih Huang

TL;DR
This paper introduces a flexible modular framework for piecewise-stationary restless bandits that combines change detection with existing algorithms, achieving near-oracle regret bounds without prior knowledge of change points.
Contribution
It proposes a novel, plug-and-play framework integrating change detection with base algorithms, enabling efficient adaptation to unknown environmental changes in restless bandits.
Findings
Achieves a regret bound of tenilde;O(sqrt{LMKT}) under the proposed framework.
Framework outperforms non-adaptive base solvers in simulations.
Close to oracle performance in handling environmental changes.
Abstract
We study the piecewise-stationary restless multi-armed bandit (PS-RMAB) problem, where each arm evolves as a Markov chain but \emph{mean rewards may change across unknown segments}. To address the resulting exploration--detection delay trade-off, we propose a modular framework that integrates arbitrary RMAB base algorithms with change detection and a novel diminishing exploration mechanism. This design enables flexible plug-and-play use of existing solvers and detectors, while efficiently adapting to mean changes without prior knowledge of their number. To evaluate performance, we introduce a refined regret notion that measures the \emph{excess regret due to exploration and detection}, benchmarked against an oracle that restarts the base algorithm at the true change points. Under this metric, we prove a regret bound of , where denotes the maximum mixing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
