Approximate Policy Iteration Schemes: A Comparison

Bruno Scherrer (INRIA Nancy - Grand Est / LORIA)

arXiv:1405.2878·cs.AI·May 13, 2014·36 cites

Approximate Policy Iteration Schemes: A Comparison

Bruno Scherrer (INRIA Nancy - Grand Est / LORIA)

PDF

Open Access

TL;DR

This paper compares various approximate policy iteration algorithms for Markov Decision Processes, analyzing their performance bounds, iteration complexity, and memory requirements, and providing insights into their trade-offs and practical implications.

Contribution

It offers a comprehensive comparison of several approximate policy iteration schemes, highlighting their performance guarantees, iteration bounds, and memory trade-offs, with new analysis on Non-Stationary Policy iteration.

Findings

01

CPI can outperform API in performance but requires exponentially more iterations.

02

PSDP$_ infty$ balances performance guarantees and iteration count.

03

NSPI(m) offers a trade-off between memory usage and performance.

Abstract

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy Iteration (CPI), a natural adaptation of the Policy Search by Dynamic Programming algorithm to the infinite-horizon case (PSDP $_{\infty}$ ), and the recently proposed Non-Stationary Policy iteration (NSPI(m)). For all algorithms, we describe performance bounds, and make a comparison by paying a particular attention to the concentrability constants involved, the number of iterations and the memory required. Our analysis highlights the following points: 1) The performance guarantee of CPI can be arbitrarily better than that of API/API( $α$ ), but this comes at the cost of a relative---exponential in $\frac{1}{ϵ}$ ---increase of the number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Machine Learning and Algorithms