Temporal Second Difference Traces

Mitchell Keith Bloch

arXiv:1104.4664·cs.LG·March 19, 2015·2 cites

Temporal Second Difference Traces

Mitchell Keith Bloch

PDF

Open Access

TL;DR

This paper introduces the temporal second difference trace (TSDT), a new model-free off-policy temporal difference method that improves learning efficiency, especially in deterministic domains, outperforming traditional methods like Q-learning and Watkins' Q(λ).

Contribution

The paper proposes TSDT, a novel off-policy TD method that leverages experience more effectively without recency heuristics, and demonstrates its advantages over existing methods in deterministic settings.

Findings

01

TSDT outperforms Q-learning and Watkins' Q(λ) in deterministic cliff-walking.

02

TSDT's advantages diminish in noisy environments.

03

Optimistic Q(λ) shows efficacy in noisy domains.

Abstract

Q-learning is a reliable but inefficient off-policy temporal-difference method, backing up reward only one step at a time. Replacing traces, using a recency heuristic, are more efficient but less reliable. In this work, we introduce model-free, off-policy temporal difference methods that make better use of experience than Watkins' Q(\lambda). We introduce both Optimistic Q(\lambda) and the temporal second difference trace (TSDT). TSDT is particularly powerful in deterministic domains. TSDT uses neither recency nor frequency heuristics, storing (s,a,r,s',\delta) so that off-policy updates can be performed after apparently suboptimal actions have been taken. There are additional advantages when using state abstraction, as in MAXQ. We demonstrate that TSDT does significantly better than both Q-learning and Watkins' Q(\lambda) in a deterministic cliff-walking domain. Results in a noisy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Software Testing and Debugging Techniques