A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
Long Yang, Minhao Shi, Qian Zheng, Wenjia Meng, Gang Pan

TL;DR
This paper introduces a new reinforcement learning algorithm, Q(σ,λ), that unifies existing multi-step TD methods with eligibility traces, improving learning speed and efficiency.
Contribution
The paper develops Q(σ,λ), combining multi-step TD learning with eligibility traces, and provides theoretical convergence guarantees and empirical performance improvements.
Findings
Q(σ,λ) converges exponentially to the optimal value function.
Intermediate σ values lead to faster learning than extreme values.
Q(σ,λ) outperforms traditional TD methods in experiments.
Abstract
Recently, a new multi-step temporal learning algorithm, called , unifies -step Tree-Backup (when ) and -step Sarsa (when ) by introducing a sampling parameter . However, similar to other multi-step temporal-difference learning algorithms, needs much memory consumption and computation time. Eligibility trace is an important mechanism to transform the off-line updates into efficient on-line ones which consume less memory and computation time. In this paper, we further develop the original , combine it with eligibility traces and propose a new algorithm, called , in which is trace-decay parameter. This idea unifies Sarsa (when ) and (when ). Furthermore, we give an upper error bound of policy evaluation algorithm. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications
MethodsEligibility Trace · Sarsa
