Analysis of Lower Bounds for Simple Policy Iteration

Sarthak Consul; Bhishma Dedhia; Kumar Ashutosh; Parthasarathi; Khirwadkar

arXiv:1911.12842·cs.LG·December 2, 2019

Analysis of Lower Bounds for Simple Policy Iteration

Sarthak Consul, Bhishma Dedhia, Kumar Ashutosh, Parthasarathi, Khirwadkar

PDF

Open Access

TL;DR

This paper extends previous exponential lower bounds on the number of iterations for simple policy iteration algorithms in Markov Decision Processes, now applicable to k-action, N-state MDPs, using a novel construction and analysis.

Contribution

It generalizes earlier bounds to k-action MDPs and introduces a new family of MDPs with an index-based switching rule demonstrating the exponential lower bound.

Findings

01

Established a lower bound of O((3+k)2^{N/2-3}) iterations for k-action MDPs.

02

Constructed a family of MDPs demonstrating the bound.

03

Generalized previous results from 2-action to k-action MDPs.

Abstract

Policy iteration is a family of algorithms that are used to find an optimal policy for a given Markov Decision Problem (MDP). Simple Policy iteration (SPI) is a type of policy iteration where the strategy is to change the policy at exactly one improvable state at every step. Melekopoglou and Condon [1990] showed an exponential lower bound on the number of iterations taken by SPI for a 2 action MDP. The results have not been generalized to $k -$ action MDP since. In this paper, we revisit the algorithm and the analysis done by Melekopoglou and Condon. We generalize the previous result and prove a novel exponential lower bound on the number of iterations taken by policy iteration for $N -$ state, $k -$ action MDPs. We construct a family of MDPs and give an index-based switching rule that yields a strong lower bound of $O ((3 + k) 2^{N /2 - 3})$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Machine Learning and Algorithms