Lambda-Policy Iteration: A Review and a New Implementation

Dimitri P. Bertsekas

arXiv:1507.01029·cs.SY·July 7, 2015·2 cites

Lambda-Policy Iteration: A Review and a New Implementation

Dimitri P. Bertsekas

PDF

Open Access

TL;DR

This paper reviews lambda-policy iteration, a flexible dynamic programming method, and introduces a new implementation using geometric sampling to improve policy evaluation efficiency.

Contribution

It provides a comprehensive review of lambda-policy iteration and proposes a novel geometric sampling approach for more effective simulation-based policy evaluation.

Findings

01

The new implementation offers advantages over traditional policy iteration methods.

02

Geometric sampling reduces the need for long trajectories in simulation.

03

Theoretical analysis of bias and exploration in lambda-policy iteration.

Abstract

In this paper we discuss $\l$ -policy iteration, a method for exact and approximate dynamic programming. It is intermediate between the classical value iteration (VI) and policy iteration (PI) methods, and it is closely related to optimistic (also known as modified) PI, whereby each policy evaluation is done approximately, using a finite number of VI. We review the theory of the method and associated questions of bias and exploration arising in simulation-based cost function approximation. We then discuss various implementations, which offer advantages over well-established PI methods that use LSPE( $\l$ ), LSTD( $\l$ ), or TD( $\l$ ) for policy evaluation with cost function approximation. One of these implementations is based on a new simulation scheme, called geometric sampling, which uses multiple short trajectories rather than a single infinitely long trajectory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics