Low-Complexity Algorithm for Restless Bandits with Imperfect Observations
Keqin Liu, Richard Weber, Chengzhong Zhang

TL;DR
This paper introduces a low-complexity algorithm for restless bandit problems with imperfect observations, simplifying dynamic programming and achieving near-optimal performance in complex, error-prone settings.
Contribution
It proposes a novel approach to reduce complexity in restless bandits with observation errors and establishes conditions for indexability and optimality.
Findings
Algorithm achieves strong performance in simulations.
Near-optimal results in general parametric space.
Proven optimality for homogeneous systems.
Abstract
We consider a class of restless bandit problems that finds a broad application area in reinforcement learning and stochastic optimization. We consider independent discrete-time Markov processes, each of which had two possible states: 1 and 0 (`good' and `bad'). Only if a process is both in state 1 and observed to be so does reward accrue. The aim is to maximize the expected discounted sum of returns over the infinite horizon subject to a constraint that only processes may be observed at each step. Observation is error-prone: there are known probabilities that state 1 (0) will be observed as 0 (1). From this one knows, at any time , a probability that process is in state 1. The resulting system may be modeled as a restless multi-armed bandit problem with an information state space of uncountable cardinality. Restless bandit problems with even finite state spaces are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Smart Grid Energy Management
