Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes
Jan K\v{r}et\'insk\'y, Tobias Meggendorfer

TL;DR
This paper introduces techniques that significantly accelerate strategy iteration for mean payoff in Markov decision processes, making it more practical by overcoming previous scalability issues while maintaining its advantages.
Contribution
The authors develop methods to drastically improve the efficiency of strategy iteration for MDPs, enabling its use in larger problems.
Findings
Strategy iteration can be made scalable for MDPs.
Proposed techniques outperform traditional methods in speed.
Maintains advantages of precision and domain-knowledge integration.
Abstract
Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related properties. Strategy iteration is one of the solution techniques applicable in this context. While in many other contexts it is the technique of choice due to advantages over e.g. value iteration, such as precision or possibility of domain-knowledge-aware initialization, it is rarely used for MDPs, since there it scales worse than value iteration. We provide several techniques that speed up strategy iteration by orders of magnitude for many MDPs, eliminating the performance disadvantage while preserving all its advantages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Advanced Software Engineering Methodologies · Software Reliability and Analysis Research
