META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning
Mingde Zhao

TL;DR
This paper introduces a meta-learning approach to adapt eligibility traces in TD-learning, enhancing sample efficiency and robustness in reinforcement learning by online, state-dependent adjustment of the trace parameter.
Contribution
It proposes a novel meta-learning method for dynamic, state-dependent adjustment of eligibility traces in TD-learning, improving efficiency and robustness.
Findings
Significant performance improvements demonstrated.
Enhanced robustness to learning rate variations.
Applicable to both on-policy and off-policy learning.
Abstract
Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies. TD-learning with eligibility traces provides a way to do temporal credit assignment, i.e. decide which portion of a reward should be assigned to predecessor states that occurred at different previous times, controlled by a parameter . However, tuning this parameter can be time-consuming, and not tuning it can lead to inefficient learning. To improve the sample efficiency of TD-learning, we propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner. The adaptation is achieved with the help of auxiliary learners that learn distributional information about the update targets online, incurring roughly the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management
MethodsEligibility Trace
