Should All Temporal Difference Learning Use Emphasis?

Xiang Gu; Sina Ghiassian; Richard S. Sutton

arXiv:1903.00194·cs.AI·March 4, 2019·1 cites

Should All Temporal Difference Learning Use Emphasis?

Xiang Gu, Sina Ghiassian, Richard S. Sutton

PDF

Open Access 1 Repo

TL;DR

This paper empirically demonstrates that Emphatic Temporal Difference (ETD) learning often converges and outperforms traditional TD methods in both on-policy and off-policy scenarios, suggesting ETD as a promising alternative.

Contribution

The study provides empirical evidence that ETD can outperform TD in various settings, challenging the notion that ETD is only beneficial for off-policy convergence.

Findings

01

ETD converges on several on-policy experiments where TD diverges.

02

ETD outperforms TD on the mountain car prediction task.

03

ETD shows potential as a general substitute for conventional TD learning.

Abstract

Emphatic Temporal Difference (ETD) learning has recently been proposed as a convergent off-policy learning method. ETD was proposed mainly to address convergence issues of conventional Temporal Difference (TD) learning under off-policy training but it is different from conventional TD learning even under on-policy training. A simple counterexample provided back in 2017 pointed to a potential class of problems where ETD converges but TD diverges. In this paper, we empirically show that ETD converges on a few other well-known on-policy experiments whereas TD either diverges or performs poorly. We also show that ETD outperforms TD on the mountain car prediction problem. Our results, together with a similar pattern observed under off-policy training in prior works, suggest that ETD might be a good substitute over conventional TD.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Xiang-Gu/Should-ALL-Temporal-Difference-Learning-Use-Emphasis
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics