True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide
Richard S. Sutton

TL;DR
This paper provides a comprehensive guide to implementing true online emphatic TD(λ), a model-free reinforcement learning algorithm that effectively combines emphasis and true-online techniques for long-term prediction tasks.
Contribution
It offers a detailed implementation guide for true online emphatic TD(λ), integrating emphasis, true-online methods, and off-policy learning with linear function approximation.
Findings
Facilitates accurate long-term predictions in reinforcement learning.
Supports off-policy training with emphasis on interest.
Combines emphasis and true-online techniques effectively.
Abstract
This document is a guide to the implementation of true online emphatic TD(), a model-free temporal-difference algorithm for learning to make long-term predictions which combines the emphasis idea (Sutton, Mahmood & White 2015) and the true-online idea (van Seijen & Sutton 2014). The setting used here includes linear function approximation, the possibility of off-policy training, and all the generality of general value functions, as well as the emphasis algorithm's notion of "interest".
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHydrological Forecasting Using AI
