Finite-Time Analysis of Whittle Index based Q-Learning for Restless   Multi-Armed Bandits with Neural Network Function Approximation

Guojun Xiong; Jian Li

arXiv:2310.02147·cs.LG·October 4, 2023·1 cites

Finite-Time Analysis of Whittle Index based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation

Guojun Xiong, Jian Li

PDF

Open Access 1 Video

TL;DR

This paper introduces Neural-Q-Whittle, a neural network-based Q-learning algorithm for restless multi-armed bandits, providing the first finite-time convergence analysis with a rate of O(1/k^{2/3}) under Markov data.

Contribution

It offers a novel finite-time convergence analysis of a neural network-based Whittle index Q-learning algorithm for RMABs, addressing the gap in understanding its non-asymptotic behavior.

Findings

01

Achieves a convergence rate of O(1/k^{2/3})

02

Provides finite-time analysis under Markov data

03

Leverages Lyapunov drift for coupled parameter analysis

Abstract

Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB) problem. Although it is provably asymptotically optimal, finding Whittle indices remains difficult. In this paper, we present Neural-Q-Whittle, a Whittle index based Q-learning algorithm for RMAB with neural network function approximation, which is an example of nonlinear two-timescale stochastic approximation with Q-function values updated on a faster timescale and Whittle indices on a slower timescale. Despite the empirical success of deep Q-learning, the non-asymptotic convergence rate of Neural-Q-Whittle, which couples neural networks with two-timescale Q-learning largely remains unclear. This paper provides a finite-time analysis of Neural-Q-Whittle, where data are generated from a Markov chain, and Q-function is approximated by a ReLU neural network. Our analysis leverages a Lyapunov drift…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Finite-Time Analysis of Whittle Index based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Smart Grid Energy Management

MethodsQ-Learning