Optimality of Myopic Policy for Restless Multiarmed Bandit with   Imperfect Observation

Kehao Wang

arXiv:1602.00195·math.OC·February 2, 2016·GLOBECOM

Optimality of Myopic Policy for Restless Multiarmed Bandit with Imperfect Observation

Kehao Wang

PDF

Open Access

TL;DR

This paper analyzes a restless multi-armed bandit problem with imperfect observations, providing conditions under which a simple myopic policy is proven to be optimal, thus offering a computationally feasible solution.

Contribution

The paper derives analytical, closed-form conditions that guarantee the optimality of the myopic policy in a complex RMAB setting with imperfect observations.

Findings

01

Conditions for myopic policy optimality derived

02

Analytical solutions for a class of RMAB problems

03

Simplified policy implementation with proven optimality

Abstract

We consider the scheduling problem concerning N projects. Each project evolves as a multi-state Markov process. At each time instant, one project is scheduled to work, and some reward depending on the state of the chosen project is obtained. The objective is to design a scheduling policy that maximizes the expected accumulated discounted reward over a finite or infinite horizon. The considered problem can be cast into a restless multi-armed bandit (RMAB) problem that is of fundamental importance in decision theory. It is well-known that solving the RMAB problem is PSPACE-hard, with the optimal policy usually intractable due to the exponential computation complexity. A natural alternative is to consider the easily implementable myopic policy that maximizes the immediate reward. In this paper, we perform an analytical study on the considered RMAB problem, and establish a set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Age of Information Optimization