Off-policy Learning with Eligibility Traces: A Survey

Matthieu Geist; Bruno Scherrer (INRIA Lorraine - LORIA)

arXiv:1304.3999·cs.AI·April 16, 2013·39 cites

Off-policy Learning with Eligibility Traces: A Survey

Matthieu Geist, Bruno Scherrer (INRIA Lorraine - LORIA)

PDF

Open Access

TL;DR

This survey reviews off-policy learning algorithms with eligibility traces in Markov Decision Processes, unifying existing methods, proposing new extensions, and empirically comparing their performance on benchmark problems.

Contribution

It provides a systematic derivation of off-policy algorithms with eligibility traces, introduces new methods, and evaluates their empirical performance.

Findings

01

Off-policy LSTD(λ) and LSPE(λ) perform best in experiments.

02

TD(λ) is effective when feature space is large.

03

Most algorithms show convergence properties discussed in the literature.

Abstract

In the framework of Markov Decision Processes, off-policy learning, that is the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We briefly review on-policy learning algorithms of the literature (gradient-based and least-squares-based), adopting a unified algorithmic view. Then, we highlight a systematic approach for adapting them to off-policy learning with eligibility traces. This leads to some known algorithms - off-policy LSTD(\lambda), LSPE(\lambda), TD(\lambda), TDC/GQ(\lambda) - and suggests new extensions - off-policy FPKF(\lambda), BRM(\lambda), gBRM(\lambda), GTD2(\lambda). We describe a comprehensive algorithmic derivation of all algorithms in a recursive and memory-efficent form, discuss their known convergence properties and illustrate their relative empirical behavior on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Machine Learning and Data Classification