General Formulation and PCL-Analysis for Restless Bandits with Limited Observability
Keqin Liu, Qizhen Jia

TL;DR
This paper introduces a general observation model for restless bandits with limited, noisy observations, analyzing indexability and proposing an approximation algorithm with strong numerical performance.
Contribution
It formulates a new restless bandit model with partial, error-prone observations and develops an approximation method leveraging PCL and the AG algorithm for practical solutions.
Findings
The proposed algorithm performs excellently in numerical experiments.
The model effectively captures limited and noisy observation scenarios.
Indexability and priority indices are analytically established.
Abstract
In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player is based on the past observation history that is limited (partial) and error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of the observation process, we formulate the problem as a restless bandit with an infinite high-dimensional belief state space. We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Ni\~no-Mora (2001) for finite-state problems can be applied. Numerical experiments show that our algorithm has excellent performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reservoir Engineering and Simulation Methods
