Extreme occupation measures in Markov decision processes with a cemetery
Alexey Piunovskiy, Yi Zhang

TL;DR
This paper analyzes extreme occupation measures in Markov decision processes with an absorbing cemetery state, proving that finite extreme measures are generated by deterministic strategies and characterizing solutions to constrained problems as mixtures of a limited number of such strategies.
Contribution
It establishes that finite extreme occupation measures are generated by deterministic stationary strategies and characterizes solutions to constrained MDPs as mixtures of a bounded number of these strategies.
Findings
Finite extreme occupation measures are generated by deterministic stationary strategies.
Solutions to constrained MDPs can be expressed as mixtures of at most J+1 such strategies.
Under mild conditions, strategies inducing infinite occupation measures are not optimal.
Abstract
In this paper, we consider a Markov decision process (MDP) with a Borel state space , where is an absorbing state (cemetery), and a Borel action space . We consider the space of finite occupation measures restricted on , and the extreme points in it. It is possible that some strategies have infinite occupation measures. Nevertheless, we prove that every finite extreme occupation measure is generated by a deterministic stationary strategy. Then, for this MDP, we consider a constrained problem with total undiscounted criteria and constraints, where the cost functions are nonnegative. By assumption, the strategies inducing infinite occupation measures are not optimal. Then, our second main result is that, under mild conditions, the solution to this constrained MDP is given by a mixture of no more than …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbability and Risk Models · Optimization and Search Problems
