Whittle index based Q-learning for restless bandits with average reward

Konstantin E. Avrachenkov; Vivek S. Borkar

arXiv:2004.14427·cs.LG·September 22, 2021·5 cites

Whittle index based Q-learning for restless bandits with average reward

Konstantin E. Avrachenkov, Vivek S. Borkar

PDF

Open Access

TL;DR

This paper presents a new reinforcement learning algorithm combining Q-learning and Whittle index for restless bandits, achieving computational efficiency and strong empirical results in average reward settings.

Contribution

It introduces a novel Whittle index based Q-learning algorithm that reduces search space and improves computational efficiency for restless bandits.

Findings

01

Achieves significant computational gains over traditional methods.

02

Demonstrates excellent empirical performance in numerical experiments.

03

Provides rigorous convergence analysis.

Abstract

A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Adaptive Dynamic Programming Control

MethodsQ-Learning