Whittle Index Learning Algorithms for Restless Bandits with Constant   Stepsizes

Vishesh Mittal; Rahul Meshram; Surya Prakash

arXiv:2409.04605·cs.LG·September 10, 2024

Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes

Vishesh Mittal, Rahul Meshram, Surya Prakash

PDF

Open Access

TL;DR

This paper develops and analyzes Whittle index learning algorithms for restless bandits using Q-learning, deep Q-networks, and function approximation, with a focus on constant stepsize two-timescale stochastic approximation.

Contribution

It introduces a two-timescale stochastic approximation framework for Whittle index learning with constant stepsizes, including extensions to deep Q-networks and function approximation.

Findings

01

Algorithms successfully learn the Whittle index in numerical experiments.

02

Constant stepsize methods converge under certain conditions.

03

Deep Q-network extension improves learning in complex environments.

Abstract

We study the Whittle index learning algorithm for restless multi-armed bandits. We consider index learning algorithm with Q-learning. We first present Q-learning algorithm with exploration policies -- epsilon-greedy, softmax, epsilon-softmax with constant stepsizes. We extend the study of Q-learning to index learning for single-armed restless bandit. The algorithm of index learning is two-timescale variant of stochastic approximation, on slower timescale we update index learning scheme and on faster timescale we update Q-learning assuming fixed index value. In Q-learning updates are in asynchronous manner. We study constant stepsizes two timescale stochastic approximation algorithm. We provide analysis of two-timescale stochastic approximation for index learning with constant stepsizes. Further, we present study on index learning with deep Q-network (DQN) learning and linear function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Optimization and Search Problems

MethodsConvolution · Dense Connections · Deep Q-Network · Q-Learning