Finite-Time Analysis of Whittle Index based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation
Guojun Xiong, Jian Li

TL;DR
This paper introduces Neural-Q-Whittle, a neural network-based Q-learning algorithm for restless multi-armed bandits, providing the first finite-time convergence analysis with a rate of O(1/k^{2/3}) under Markov data.
Contribution
It offers a novel finite-time convergence analysis of a neural network-based Whittle index Q-learning algorithm for RMABs, addressing the gap in understanding its non-asymptotic behavior.
Findings
Achieves a convergence rate of O(1/k^{2/3})
Provides finite-time analysis under Markov data
Leverages Lyapunov drift for coupled parameter analysis
Abstract
Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB) problem. Although it is provably asymptotically optimal, finding Whittle indices remains difficult. In this paper, we present Neural-Q-Whittle, a Whittle index based Q-learning algorithm for RMAB with neural network function approximation, which is an example of nonlinear two-timescale stochastic approximation with Q-function values updated on a faster timescale and Whittle indices on a slower timescale. Despite the empirical success of deep Q-learning, the non-asymptotic convergence rate of Neural-Q-Whittle, which couples neural networks with two-timescale Q-learning largely remains unclear. This paper provides a finite-time analysis of Neural-Q-Whittle, where data are generated from a Markov chain, and Q-function is approximated by a ReLU neural network. Our analysis leverages a Lyapunov drift…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Smart Grid Energy Management
MethodsQ-Learning
