On reachability of Markov decision processes: a novel state-classification-based PI approach
Yanyun Li, Xin Guo, Xianping Guo

TL;DR
This paper presents a new state-classification-based policy iteration method for Markov decision processes, enabling efficient computation of optimal policies that maximize system reliability from any initial state.
Contribution
It introduces a novel optimality equation and a finite-step policy iteration algorithm leveraging absorbing sets for reliability maximization in Markov decision processes.
Findings
Finite-step convergence of the policy iteration algorithm.
Characterization and computation of absorbing sets.
Application example in reliability and maintenance.
Abstract
This paper concentrates on the reliability of a discrete-time controlled Markov system with finite states and actions, and aims to give an efficient algorithm for obtaining an optimal (control) policy that makes the system have the maximal reliability for every initial state. After establishing the existence of an optimal policy, for the computation of optimal policies, we introduce the concept of an absorbing set of a stationary policy, and find some characterization and a computational method of the absorbing sets. Using the largest absorbing set, we build a novel optimality equation (OE), and prove the uniqueness of a solution of the OE. Furthermore, we provide a policy iteration algorithm of optimal policies, and prove that an optimal policy and the maximal reliability can be obtained in a finite number of iterations. Finally, an example in reliability and maintenance problems is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuel Cells and Related Materials
