Convergence Analysis of Policy Iteration

Ali Heydari

arXiv:1505.05216·cs.SY·May 21, 2015·1 cites

Convergence Analysis of Policy Iteration

Ali Heydari

PDF

Open Access

TL;DR

This paper analyzes the convergence properties of policy iteration in optimal control of nonlinear systems, establishing conditions for convergence and optimality, and comparing its speed to value iteration, including multi-step look-ahead extensions.

Contribution

It provides a rigorous convergence analysis of policy iteration starting from a stabilizing control, and compares its convergence speed to value iteration, extending results to multi-step look-ahead methods.

Findings

01

Policy iteration converges to the optimal solution under certain conditions.

02

The convergence speed of policy iteration is compared to value iteration.

03

Results are extended to multi-step look-ahead policy iteration.

Abstract

Adaptive optimal control of nonlinear dynamic systems with deterministic and known dynamics under a known undiscounted infinite-horizon cost function is investigated. Policy iteration scheme initiated using a stabilizing initial control is analyzed in solving the problem. The convergence of the iterations and the optimality of the limit functions, which follows from the established uniqueness of the solution to the Bellman equation, are the main results of this study. Furthermore, a theoretical comparison between the speed of convergence of policy iteration versus value iteration is presented. Finally, the convergence results are extended to the case of multi-step look-ahead policy iteration.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Frequency Control in Power Systems · Optimization and Variational Analysis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings