Indexability and Rollout Policy for Multi-State Partially Observable   Restless Bandits

Rahul Meshram; Kesav Kaza

arXiv:2108.00892·cs.LG·August 3, 2021

Indexability and Rollout Policy for Multi-State Partially Observable Restless Bandits

Rahul Meshram, Kesav Kaza

PDF

Open Access

TL;DR

This paper studies multi-state partially observable restless bandits, establishing structural properties, indexability, and proposing Monte Carlo rollout policies for different information scenarios, with applications in communication and recommendation systems.

Contribution

It introduces new structural properties, indexability results, and Monte Carlo rollout policies for three models of partially observable restless bandits, including explicit index formulas.

Findings

01

Monte Carlo rollout policy performs competitively with myopic policy.

02

Indexability is established for models 2 and 3.

03

Explicit index formula derived for model 3.

Abstract

Restless multi-armed bandits with partially observable states has applications in communication systems, age of information and recommendation systems. In this paper, we study multi-state partially observable restless bandit models. We consider three different models based on information observable to decision maker -- 1) no information is observable from actions of a bandit 2) perfect information from bandit is observable only for one action on bandit, there is a fixed restart state, i.e., transition occurs from all other states to that state 3) perfect state information is available to decision maker for both actions on a bandit and there are two restart state for two actions. We develop the structural properties. We also show a threshold type policy and indexability for model 2 and 3. We present Monte Carlo (MC) rollout policy. We use it for whittle index computation in case of model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Game Theory and Applications