Transience in Countable MDPs
Stefan Kiefer, Richard Mayr, Mahsa Shirmohammadi, Patrick Totzke

TL;DR
This paper investigates the properties of transience in countably infinite Markov Decision Processes, establishing the existence of optimal strategies, their complexity, and implications for other objectives.
Contribution
It proves fundamental properties of transience in countably infinite MDPs, including strategy existence, complexity, and relationships with other objectives.
Findings
Existence of uniformly ε-optimal memoryless deterministic strategies for Transience.
Optimal strategies may not always exist, but if they do, they can be memoryless and deterministic.
Universal transience simplifies strategy requirements for other objectives like Safety and Parity.
Abstract
The Transience objective is not to visit any state infinitely often. While this is not possible in finite Markov Decision Process (MDP), it can be satisfied in countably infinite ones, e.g., if the transition graph is acyclic. We prove the following fundamental properties of Transience in countably infinite MDPs. 1. There exist uniformly -optimal MD strategies (memoryless deterministic) for Transience, even in infinitely branching MDPs. 2. Optimal strategies for Transience need not exist, even if the MDP is finitely branching. However, if an optimal strategy exists then there is also an optimal MD strategy. 3. If an MDP is universally transient (i.e., almost surely transient under all strategies) then many other objectives have a lower strategy complexity than in general MDPs. E.g., -optimal strategies for Safety and co-B\"uchi and optimal strategies for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
