Off-Policy Reinforcement Learning with Loss Function Weighted by   Temporal Difference Error

Bumgeun Park; Taeyoung Kim; Woohyeon Moon; Luiz Felipe Vecchietti and; Dongsoo Har

arXiv:2212.13175·cs.LG·December 27, 2022

Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error

Bumgeun Park, Taeyoung Kim, Woohyeon Moon, Luiz Felipe Vecchietti and, Dongsoo Har

PDF

Open Access

TL;DR

This paper introduces a novel loss function weighting method based on TD error to improve off-policy reinforcement learning, enhancing convergence speed and performance when combined with prioritization techniques.

Contribution

The paper proposes a new experience weighting approach in the loss function for off-policy RL, which can be combined with prioritization to boost learning efficiency and effectiveness.

Findings

01

Achieves 33%-76% faster convergence in some environments.

02

Increases returns by 11% in certain tasks.

03

Improves success rates by 3%-10% in others.

Abstract

Training agents via off-policy deep reinforcement learning (RL) requires a large memory, named replay memory, that stores past experiences used for learning. These experiences are sampled, uniformly or non-uniformly, to create the batches used for training. When calculating the loss function, off-policy algorithms assume that all samples are of the same importance. In this paper, we hypothesize that training can be enhanced by assigning different importance for each experience based on their temporal-difference (TD) error directly in the training objective. We propose a novel method that introduces a weighting factor for each experience when calculating the loss function at the learning stage. In addition to improving convergence speed when used with uniform sampling, the method can be combined with prioritization methods for non-uniform sampling. Combining the proposed method with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings