Analysis of Lower Bounds for Simple Policy Iteration
Sarthak Consul, Bhishma Dedhia, Kumar Ashutosh, Parthasarathi, Khirwadkar

TL;DR
This paper extends previous exponential lower bounds on the number of iterations for simple policy iteration algorithms in Markov Decision Processes, now applicable to k-action, N-state MDPs, using a novel construction and analysis.
Contribution
It generalizes earlier bounds to k-action MDPs and introduces a new family of MDPs with an index-based switching rule demonstrating the exponential lower bound.
Findings
Established a lower bound of O((3+k)2^{N/2-3}) iterations for k-action MDPs.
Constructed a family of MDPs demonstrating the bound.
Generalized previous results from 2-action to k-action MDPs.
Abstract
Policy iteration is a family of algorithms that are used to find an optimal policy for a given Markov Decision Problem (MDP). Simple Policy iteration (SPI) is a type of policy iteration where the strategy is to change the policy at exactly one improvable state at every step. Melekopoglou and Condon [1990] showed an exponential lower bound on the number of iterations taken by SPI for a 2 action MDP. The results have not been generalized to action MDP since. In this paper, we revisit the algorithm and the analysis done by Melekopoglou and Condon. We generalize the previous result and prove a novel exponential lower bound on the number of iterations taken by policy iteration for state, action MDPs. We construct a family of MDPs and give an index-based switching rule that yields a strong lower bound of .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Machine Learning and Algorithms
