Lambda-Policy Iteration: A Review and a New Implementation
Dimitri P. Bertsekas

TL;DR
This paper reviews lambda-policy iteration, a flexible dynamic programming method, and introduces a new implementation using geometric sampling to improve policy evaluation efficiency.
Contribution
It provides a comprehensive review of lambda-policy iteration and proposes a novel geometric sampling approach for more effective simulation-based policy evaluation.
Findings
The new implementation offers advantages over traditional policy iteration methods.
Geometric sampling reduces the need for long trajectories in simulation.
Theoretical analysis of bias and exploration in lambda-policy iteration.
Abstract
In this paper we discuss -policy iteration, a method for exact and approximate dynamic programming. It is intermediate between the classical value iteration (VI) and policy iteration (PI) methods, and it is closely related to optimistic (also known as modified) PI, whereby each policy evaluation is done approximately, using a finite number of VI. We review the theory of the method and associated questions of bias and exploration arising in simulation-based cost function approximation. We then discuss various implementations, which offer advantages over well-established PI methods that use LSPE(), LSTD(), or TD() for policy evaluation with cost function approximation. One of these implementations is based on a new simulation scheme, called geometric sampling, which uses multiple short trajectories rather than a single infinitely long trajectory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
