Faster Q-Learning Algorithms for Restless Bandits

Parvish Kakarapalli; Devendra Kayande; Rahul Meshram

arXiv:2409.05908·cs.LG·September 11, 2024

Faster Q-Learning Algorithms for Restless Bandits

Parvish Kakarapalli, Devendra Kayande, Rahul Meshram

PDF

Open Access

TL;DR

This paper introduces faster Q-learning algorithms, including variants and exploration policies, for restless multi-armed bandits, demonstrating improved convergence rates through numerical experiments.

Contribution

It proposes new Q-learning variants and explores their effectiveness with UCB exploration in the context of index learning for RMABs.

Findings

01

Q-learning with UCB converges faster than with ε-greedy.

02

PhaseQL with UCB achieves the fastest convergence among tested algorithms.

03

Numerical examples validate the improved convergence rates of proposed methods.

Abstract

We study the Whittle index learning algorithm for restless multi-armed bandits (RMAB). We first present Q-learning algorithm and its variants -- speedy Q-learning (SQL), generalized speedy Q-learning (GSQL) and phase Q-learning (PhaseQL). We also discuss exploration policies -- $ϵ$ -greedy and Upper confidence bound (UCB). We extend the study of Q-learning and its variants with UCB policy. We illustrate using numerical example that Q-learning with UCB exploration policy has faster convergence and PhaseQL with UCB have fastest convergence rate. We next extend the study of Q-learning variants for index learning to RMAB. The algorithm of index learning is two-timescale variant of stochastic approximation, on slower timescale we update index learning scheme and on faster timescale we update Q-learning assuming fixed index value. We study constant stepsizes two timescale stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Smart Grid Energy Management

MethodsQ-Learning