Online Learning of Whittle Indices for Restless Bandits with Non-Stationary Transition Kernels
Md Kamran Chowdhury Shisher, Vishrant Tripathi, Mung Chiang, Christopher G. Brinton

TL;DR
This paper introduces an adaptive online algorithm for resource allocation in non-stationary restless bandit problems, leveraging Whittle indices with theoretical guarantees and practical efficiency.
Contribution
It proposes a Sliding-Window Online Whittle policy that adapts to unknown, time-varying dynamics with sub-linear regret guarantees and a method to tune window size online.
Findings
Algorithm achieves sub-linear dynamic regret.
Outperforms baselines in non-stationary environments.
Effectively adapts to unknown variation budgets.
Abstract
The restless multi-armed bandit (RMAB) framework is a popular approach to solving resource allocation problems in networked systems. In this paper, we study optimal resource allocation in RMABs facing unknown and non-stationary dynamics. Solving RMABs optimally is known to be PSPACE-hard even with full knowledge of model parameters. While Whittle index policies offer asymptotic optimality with low computational cost, they require access to stationary transition kernels, an unrealistic assumption in many modern networking applications. To address this challenge, we propose a Sliding-Window Online Whittle (SW-Whittle) policy that remains computationally efficient while adapting to time-varying kernels. Through theoretical analysis, we show that our algorithm achieves sub-linear dynamic regret with respect to the number of episodes. We further address the important case where the variation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
