An Empirical Evaluation of True Online TD({\lambda})
Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Richard S., Sutton

TL;DR
This paper empirically evaluates true online TD(λ), demonstrating its advantages over traditional TD(λ) in terms of speed, stability, and ease of use across various domains and feature representations.
Contribution
It provides the first comprehensive empirical comparison showing true online TD(λ)'s superior performance and usability over TD(λ) in diverse reinforcement learning scenarios.
Findings
True online TD(λ) has minimal overhead with sparse features.
It often learns faster than TD(λ) across tested domains.
It is easier to use and more stable with respect to step-size.
Abstract
The true online TD({\lambda}) algorithm has recently been proposed (van Seijen and Sutton, 2014) as a universal replacement for the popular TD({\lambda}) algorithm, in temporal-difference learning and reinforcement learning. True online TD({\lambda}) has better theoretical properties than conventional TD({\lambda}), and the expectation is that it also results in faster learning. In this paper, we put this hypothesis to the test. Specifically, we compare the performance of true online TD({\lambda}) with that of TD({\lambda}) on challenging examples, random Markov reward processes, and a real-world myoelectric prosthetic arm. We use linear function approximation with tabular, binary, and non-binary features. We assess the algorithms along three dimensions: computational cost, learning speed, and ease of use. Our results confirm the strength of true online TD({\lambda}): 1) for sparse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Recommender Systems and Techniques · Intelligent Tutoring Systems and Adaptive Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
