A Unified Approach for Multi-step Temporal-Difference Learning with   Eligibility Traces in Reinforcement Learning

Long Yang; Minhao Shi; Qian Zheng; Wenjia Meng; Gang Pan

arXiv:1802.03171·cs.AI·February 12, 2018·5 cites

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

Long Yang, Minhao Shi, Qian Zheng, Wenjia Meng, Gang Pan

PDF

Open Access

TL;DR

This paper introduces a new reinforcement learning algorithm, Q(σ,λ), that unifies existing multi-step TD methods with eligibility traces, improving learning speed and efficiency.

Contribution

The paper develops Q(σ,λ), combining multi-step TD learning with eligibility traces, and provides theoretical convergence guarantees and empirical performance improvements.

Findings

01

Q(σ,λ) converges exponentially to the optimal value function.

02

Intermediate σ values lead to faster learning than extreme values.

03

Q(σ,λ) outperforms traditional TD methods in experiments.

Abstract

Recently, a new multi-step temporal learning algorithm, called $Q (σ)$ , unifies $n$ -step Tree-Backup (when $σ = 0$ ) and $n$ -step Sarsa (when $σ = 1$ ) by introducing a sampling parameter $σ$ . However, similar to other multi-step temporal-difference learning algorithms, $Q (σ)$ needs much memory consumption and computation time. Eligibility trace is an important mechanism to transform the off-line updates into efficient on-line ones which consume less memory and computation time. In this paper, we further develop the original $Q (σ)$ , combine it with eligibility traces and propose a new algorithm, called $Q (σ, λ)$ , in which $λ$ is trace-decay parameter. This idea unifies Sarsa $(λ)$ (when $σ = 1$ ) and $Q^{π} (λ)$ (when $σ = 0$ ). Furthermore, we give an upper error bound of $Q (σ, λ)$ policy evaluation algorithm. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications

MethodsEligibility Trace · Sarsa