An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Sina Ghiassian, Richard S. Sutton

TL;DR
This paper empirically compares eleven off-policy prediction algorithms using linear function approximation on the Collision task, highlighting their relative performance, robustness, and sensitivity to parameters.
Contribution
It provides a comprehensive empirical evaluation of prominent off-policy prediction algorithms on a standardized task, revealing their relative strengths and weaknesses.
Findings
Emphatic-TD algorithms learned fastest and were most robust.
Gradient-TD algorithms showed moderate sensitivity to parameters.
Vtrace, Tree Backup, and ABQ performed slower with higher errors.
Abstract
Off-policy prediction -- learning the value function for one policy from data generated while following another policy -- is one of the most challenging subproblems in reinforcement learning. This paper presents empirical results with eleven prominent off-policy learning algorithms that use linear function approximation: five Gradient-TD methods, two Emphatic-TD methods, Off-policy TD(), Vtrace, and versions of Tree Backup and ABQ modified to apply to a prediction setting. Our experiments used the Collision task, a small idealized off-policy problem analogous to that of an autonomous car trying to predict whether it will collide with an obstacle. We assessed the performance of the algorithms according to their learning rate, asymptotic error level, and sensitivity to step-size and bootstrapping parameters. By these measures, the eleven algorithms can be partially ordered on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Advanced Multi-Objective Optimization Algorithms
