An Empirical Comparison of Off-policy Prediction Learning Algorithms on   the Collision Task

Sina Ghiassian; Richard S. Sutton

arXiv:2106.00922·cs.LG·June 15, 2021

An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task

Sina Ghiassian, Richard S. Sutton

PDF

Open Access 2 Repos

TL;DR

This paper empirically compares eleven off-policy prediction algorithms using linear function approximation on the Collision task, highlighting their relative performance, robustness, and sensitivity to parameters.

Contribution

It provides a comprehensive empirical evaluation of prominent off-policy prediction algorithms on a standardized task, revealing their relative strengths and weaknesses.

Findings

01

Emphatic-TD algorithms learned fastest and were most robust.

02

Gradient-TD algorithms showed moderate sensitivity to parameters.

03

Vtrace, Tree Backup, and ABQ performed slower with higher errors.

Abstract

Off-policy prediction -- learning the value function for one policy from data generated while following another policy -- is one of the most challenging subproblems in reinforcement learning. This paper presents empirical results with eleven prominent off-policy learning algorithms that use linear function approximation: five Gradient-TD methods, two Emphatic-TD methods, Off-policy TD( $λ$ ), Vtrace, and versions of Tree Backup and ABQ modified to apply to a prediction setting. Our experiments used the Collision task, a small idealized off-policy problem analogous to that of an autonomous car trying to predict whether it will collide with an obstacle. We assessed the performance of the algorithms according to their learning rate, asymptotic error level, and sensitivity to step-size and bootstrapping parameters. By these measures, the eleven algorithms can be partially ordered on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Advanced Multi-Objective Optimization Algorithms