Tabular and Deep Learning for the Whittle Index
Francisco Robledo Rela\~no (LMAP, UPPA, UPV / EHU), Vivek Borkar, (EE-IIT), Urtzi Ayesta (IRIT-RMESS, UPV/EHU, CNRS), Konstantin Avrachenkov, (Inria)

TL;DR
This paper introduces two reinforcement learning algorithms, QWI and QWINN, to learn the Whittle index for RMABPs, demonstrating convergence and scalability advantages over existing methods.
Contribution
The paper presents the first convergence proof for a tabular and a deep RL algorithm for the Whittle index, with scalable neural network implementation for large state spaces.
Findings
QWI converges to true Whittle indices.
QWINN scales to large state spaces effectively.
Both algorithms outperform existing methods in convergence speed.
Abstract
The Whittle index policy is a heuristic that has shown remarkably good performance (with guaranteed asymptotic optimality) when applied to the class of problems known as Restless Multi-Armed Bandit Problems (RMABPs). In this paper we present QWI and QWINN, two reinforcement learning algorithms, respectively tabular and deep, to learn the Whittle index for the total discounted criterion. The key feature is the use of two time-scales, a faster one to update the state-action Q -values, and a relatively slower one to update the Whittle indices. In our main theoretical result we show that QWI, which is a tabular implementation, converges to the real Whittle indices. We then present QWINN, an adaptation of QWI algorithm using neural networks to compute the Q -values on the faster time-scale, which is able to extrapolate information from one state to another and scales naturally to large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning
