Adaptive and Multiple Time-scale Eligibility Traces for Online Deep   Reinforcement Learning

Taisuke Kobayashi

arXiv:2008.10040·cs.RO·January 26, 2022

Adaptive and Multiple Time-scale Eligibility Traces for Online Deep Reinforcement Learning

Taisuke Kobayashi

PDF

TL;DR

This paper introduces an adaptive, multi-time-scale eligibility traces method for online deep reinforcement learning, enhancing sample efficiency and adaptability in environments with changing dynamics.

Contribution

It proposes a novel eligibility traces technique compatible with deep neural networks, incorporating divergence-based adaptive decay and multiple time-scale traces for improved online learning.

Findings

01

Enhanced sample efficiency in DRL tasks

02

Effective handling of environment changes

03

Compatibility with deep neural network training

Abstract

Deep reinforcement learning (DRL) is one promising approach to teaching robots to perform complex tasks. Because methods that directly reuse the stored experience data cannot follow the change of the environment in robotic problems with a time-varying environment, online DRL is required. The eligibility traces method is well known as an online learning technique for improving sample efficiency in traditional reinforcement learning with linear regressors rather than DRL. The dependency between parameters of deep neural networks would destroy the eligibility traces, which is why they are not integrated with DRL. Although replacing the gradient with the most influential one rather than accumulating the gradients as the eligibility traces can alleviate this problem, the replacing operation reduces the number of reuses of previous experiences. To address these issues, this study proposes a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.