Temporal Second Difference Traces
Mitchell Keith Bloch

TL;DR
This paper introduces the temporal second difference trace (TSDT), a new model-free off-policy temporal difference method that improves learning efficiency, especially in deterministic domains, outperforming traditional methods like Q-learning and Watkins' Q(λ).
Contribution
The paper proposes TSDT, a novel off-policy TD method that leverages experience more effectively without recency heuristics, and demonstrates its advantages over existing methods in deterministic settings.
Findings
TSDT outperforms Q-learning and Watkins' Q(λ) in deterministic cliff-walking.
TSDT's advantages diminish in noisy environments.
Optimistic Q(λ) shows efficacy in noisy domains.
Abstract
Q-learning is a reliable but inefficient off-policy temporal-difference method, backing up reward only one step at a time. Replacing traces, using a recency heuristic, are more efficient but less reliable. In this work, we introduce model-free, off-policy temporal difference methods that make better use of experience than Watkins' Q(\lambda). We introduce both Optimistic Q(\lambda) and the temporal second difference trace (TSDT). TSDT is particularly powerful in deterministic domains. TSDT uses neither recency nor frequency heuristics, storing (s,a,r,s',\delta) so that off-policy updates can be performed after apparently suboptimal actions have been taken. There are additional advantages when using state abstraction, as in MAXQ. We demonstrate that TSDT does significantly better than both Q-learning and Watkins' Q(\lambda) in a deterministic cliff-walking domain. Results in a noisy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Software Testing and Debugging Techniques
