Improved and Generalized Upper Bounds on the Complexity of Policy   Iteration

Bruno Scherrer (BIGS)

arXiv:1306.0386·math.OC·February 11, 2016

Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Bruno Scherrer (BIGS)

PDF

TL;DR

This paper provides improved upper bounds on the number of iterations for policy iteration algorithms in Markov Decision Processes, considering both discount factors and structural properties, demonstrating strong polynomiality under certain conditions.

Contribution

It introduces tighter bounds for Howard's and Simplex-PI, including discount-independent bounds based on structural properties, and extends results to broader classes of MDPs.

Findings

01

Howard's PI terminates after at most O(m/(1-γ) log(1/(1-γ))) iterations.

02

Simplex-PI terminates after at most O(nm/(1-γ) log(1/(1-γ))) iterations.

03

Under structural assumptions, Simplex-PI is strongly polynomial, and bounds are provided for both algorithms.

Abstract

Given a Markov Decision Process (MDP) with $n$ states and a totalnumber $m$ of actions, we study the number of iterations needed byPolicy Iteration (PI) algorithms to converge to the optimal $γ$ -discounted policy. We consider two variations of PI: Howard'sPI that changes the actions in all states with a positive advantage,and Simplex-PI that only changes the action in the state with maximaladvantage. We show that Howard's PI terminates after at most $O (\frac{m}{1 - γ} lo g (\frac{1}{1 - γ}))$ iterations, improving by a factor $O (lo g n)$ a result by Hansen etal., while Simplex-PI terminates after at most $O (\frac{nm}{1 - γ} lo g (\frac{1}{1 - γ}))$ iterations, improving by a factor $O (lo g n)$ a result by Ye. Undersome structural properties of the MDP, we then consider bounds thatare independent of the discount factor~ $γ$ :…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.