On the Whittle Index for Restless Multi-armed Hidden Markov Bandits
Rahul Meshram, D. Manjunath, Aditya Gopalan

TL;DR
This paper studies a complex multi-armed bandit problem where each arm's state is hidden and only inferred through noisy signals, proposing the use of Whittle's index to optimize arm selection for maximizing long-term rewards.
Contribution
It introduces a framework for hidden Markov bandits with partial observations and develops an approximate index policy based on Whittle's index for arm selection.
Findings
Single-armed bandit admits an approximate threshold policy.
The bandit satisfies an approximate indexability property.
Numerical examples validate the analytical results.
Abstract
We consider a restless multi-armed bandit in which each arm can be in one of two states. When an arm is sampled, the state of the arm is not available to the sampler. Instead, a binary signal with a known randomness that depends on the state of the arm is available. No signal is available if the arm is not sampled. An arm-dependent reward is accrued from each sampling. In each time step, each arm changes state according to known transition probabilities which in turn depend on whether the arm is sampled or not sampled. Since the state of the arm is never visible and has to be inferred from the current belief and a possible binary signal, we call this the hidden Markov bandit. Our interest is in a policy to select the arm(s) in each time step that maximizes the infinite horizon discounted reward. Specifically, we seek the use of Whittle's index in selecting the arms. We first analyze the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
