Efficient Strategy Iteration for Mean Payoff in Markov Decision   Processes

Jan K\v{r}et\'insk\'y; Tobias Meggendorfer

arXiv:1707.01859·cs.PF·September 8, 2017

Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

Jan K\v{r}et\'insk\'y, Tobias Meggendorfer

PDF

Open Access

TL;DR

This paper introduces techniques that significantly accelerate strategy iteration for mean payoff in Markov decision processes, making it more practical by overcoming previous scalability issues while maintaining its advantages.

Contribution

The authors develop methods to drastically improve the efficiency of strategy iteration for MDPs, enabling its use in larger problems.

Findings

01

Strategy iteration can be made scalable for MDPs.

02

Proposed techniques outperform traditional methods in speed.

03

Maintains advantages of precision and domain-knowledge integration.

Abstract

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related properties. Strategy iteration is one of the solution techniques applicable in this context. While in many other contexts it is the technique of choice due to advantages over e.g. value iteration, such as precision or possibility of domain-knowledge-aware initialization, it is rarely used for MDPs, since there it scales worse than value iteration. We provide several techniques that speed up strategy iteration by orders of magnitude for many MDPs, eliminating the performance disadvantage while preserving all its advantages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFormal Methods in Verification · Advanced Software Engineering Methodologies · Software Reliability and Analysis Research