True Online Temporal-Difference Learning
Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski and, Marlos C. Machado, Richard S. Sutton

TL;DR
This paper introduces true online TD($$) and Sarsa($$), which maintain exact equivalence with their forward view, and demonstrates through extensive experiments that they outperform traditional methods in speed and simplicity.
Contribution
The paper provides the first comprehensive empirical comparison showing true online methods outperform traditional TD($$) and Sarsa($$), and offers a theoretical framework for deriving new true online algorithms.
Findings
True online methods often learn faster than regular methods.
True online methods do not require choosing between different traces.
They maintain exact forward view equivalence at all times.
Abstract
The temporal-difference methods TD() and Sarsa() form a core part of modern reinforcement learning. Their appeal comes from their good performance, low computational cost, and their simple interpretation, given by their forward view. Recently, new versions of these methods were introduced, called true online TD() and true online Sarsa(), respectively (van Seijen & Sutton, 2014). These new versions maintain an exact equivalence with the forward view at all times, whereas the traditional versions only approximate it for small step-sizes. We hypothesize that these true online methods not only have better theoretical properties, but also dominate the regular methods empirically. In this article, we put this hypothesis to the test by performing an extensive empirical comparison. Specifically, we compare the performance of true online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Muscle activation and electromyography studies · Ferroelectric and Negative Capacitance Devices
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
