True Online Emphatic TD($\lambda$): Quick Reference and Implementation   Guide

Richard S. Sutton

arXiv:1507.07147·cs.LG·July 28, 2015

True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide

Richard S. Sutton

PDF

Open Access

TL;DR

This paper provides a comprehensive guide to implementing true online emphatic TD(λ), a model-free reinforcement learning algorithm that effectively combines emphasis and true-online techniques for long-term prediction tasks.

Contribution

It offers a detailed implementation guide for true online emphatic TD(λ), integrating emphasis, true-online methods, and off-policy learning with linear function approximation.

Findings

01

Facilitates accurate long-term predictions in reinforcement learning.

02

Supports off-policy training with emphasis on interest.

03

Combines emphasis and true-online techniques effectively.

Abstract

This document is a guide to the implementation of true online emphatic TD( $λ$ ), a model-free temporal-difference algorithm for learning to make long-term predictions which combines the emphasis idea (Sutton, Mahmood & White 2015) and the true-online idea (van Seijen & Sutton 2014). The setting used here includes linear function approximation, the possibility of off-policy training, and all the generality of general value functions, as well as the emphasis algorithm's notion of "interest".

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHydrological Forecasting Using AI