Low-Complexity Algorithm for Restless Bandits with Imperfect   Observations

Keqin Liu; Richard Weber; Chengzhong Zhang

arXiv:2108.03812·cs.LG·May 14, 2024

Low-Complexity Algorithm for Restless Bandits with Imperfect Observations

Keqin Liu, Richard Weber, Chengzhong Zhang

PDF

Open Access

TL;DR

This paper introduces a low-complexity algorithm for restless bandit problems with imperfect observations, simplifying dynamic programming and achieving near-optimal performance in complex, error-prone settings.

Contribution

It proposes a novel approach to reduce complexity in restless bandits with observation errors and establishes conditions for indexability and optimality.

Findings

01

Algorithm achieves strong performance in simulations.

02

Near-optimal results in general parametric space.

03

Proven optimality for homogeneous systems.

Abstract

We consider a class of restless bandit problems that finds a broad application area in reinforcement learning and stochastic optimization. We consider $N$ independent discrete-time Markov processes, each of which had two possible states: 1 and 0 (`good' and `bad'). Only if a process is both in state 1 and observed to be so does reward accrue. The aim is to maximize the expected discounted sum of returns over the infinite horizon subject to a constraint that only $M$ $(< N)$ processes may be observed at each step. Observation is error-prone: there are known probabilities that state 1 (0) will be observed as 0 (1). From this one knows, at any time $t$ , a probability that process $i$ is in state 1. The resulting system may be modeled as a restless multi-armed bandit problem with an information state space of uncountable cardinality. Restless bandit problems with even finite state spaces are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Smart Grid Energy Management