On learning Whittle index policy for restless bandits with scalable   regret

Nima Akbarzadeh; Aditya Mahajan

arXiv:2202.03463·cs.LG·April 28, 2023

On learning Whittle index policy for restless bandits with scalable regret

Nima Akbarzadeh, Aditya Mahajan

PDF

Open Access

TL;DR

This paper introduces a model-based reinforcement learning algorithm for restless bandit problems that achieves scalable regret bounds, improving over traditional RL methods by exploiting problem structure.

Contribution

The paper proposes a Thompson-sampling based RL algorithm tailored for restless bandits with regret bounds that scale independently of the state space size.

Findings

01

Regret scales as (mn ext{ or } n^2)\

02

Under certain conditions, regret scales as (n^{1.5} or ( ext{max}\{m ext{, } n ext{)} ext{)}.

03

Numerical examples demonstrate the effectiveness of the proposed algorithm.

Abstract

Reinforcement learning is an attractive approach to learn good resource allocation and scheduling policies based on data when the system model is unknown. However, the cumulative regret of most RL algorithms scales as $\tilde{O} (S A T)$ , where $S$ is the size of the state space, $A$ is the size of the action space, $T$ is the horizon, and the $\tilde{O} (\cdot)$ notation hides logarithmic terms. Due to the linear dependence on the size of the state space, these regret bounds are prohibitively large for resource allocation and scheduling problems. In this paper, we present a model-based RL algorithm for such problems which has scalable regret. In particular, we consider a restless bandit model, and propose a Thompson-sampling based learning algorithm which is tuned to the underlying structure of the model. We present two characterizations of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management