META-Learning Eligibility Traces for More Sample Efficient Temporal   Difference Learning

Mingde Zhao

arXiv:2006.08906·cs.LG·June 17, 2020

META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning

Mingde Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a meta-learning approach to adapt eligibility traces in TD-learning, enhancing sample efficiency and robustness in reinforcement learning by online, state-dependent adjustment of the trace parameter.

Contribution

It proposes a novel meta-learning method for dynamic, state-dependent adjustment of eligibility traces in TD-learning, improving efficiency and robustness.

Findings

01

Significant performance improvements demonstrated.

02

Enhanced robustness to learning rate variations.

03

Applicable to both on-policy and off-policy learning.

Abstract

Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies. TD-learning with eligibility traces provides a way to do temporal credit assignment, i.e. decide which portion of a reward should be assigned to predecessor states that occurred at different previous times, controlled by a parameter $λ$ . However, tuning this parameter can be time-consuming, and not tuning it can lead to inefficient learning. To improve the sample efficiency of TD-learning, we propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner. The adaptation is achieved with the help of auxiliary learners that learn distributional information about the update targets online, incurring roughly the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PwnerHarry/META
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management

MethodsEligibility Trace