Whittle index based Q-learning for restless bandits with average reward
Konstantin E. Avrachenkov, Vivek S. Borkar

TL;DR
This paper presents a new reinforcement learning algorithm combining Q-learning and Whittle index for restless bandits, achieving computational efficiency and strong empirical results in average reward settings.
Contribution
It introduces a novel Whittle index based Q-learning algorithm that reduces search space and improves computational efficiency for restless bandits.
Findings
Achieves significant computational gains over traditional methods.
Demonstrates excellent empirical performance in numerical experiments.
Provides rigorous convergence analysis.
Abstract
A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Adaptive Dynamic Programming Control
MethodsQ-Learning
