On the Complexity of Policy Iteration

Yishay Mansour; Satinder Singh

arXiv:1301.6718·cs.AI·January 30, 2013·76 cites

On the Complexity of Policy Iteration

Yishay Mansour, Satinder Singh

PDF

Open Access

TL;DR

This paper establishes the first non-trivial worst-case upper bounds on the number of iterations policy iteration requires to find the optimal policy in Markov decision processes, independent of the discount factor.

Contribution

It provides the first analysis of policy iteration complexity with bounds that do not depend on the discount factor, enhancing understanding of its convergence behavior.

Findings

01

Derived the first non-trivial worst-case bounds for PI complexity

02

Showed PI's progression through policy space is better understood with these bounds

03

Bounded the number of iterations needed for PI to reach optimality

Abstract

Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first such non-trivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy. Our analysis also sheds new light on the manner in which PI progresses through the space of policies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Formal Methods in Verification