Parameter-free Reduction of the Estimation Bias in Deep Reinforcement   Learning for Deterministic Policy Gradients

Baturay Saglam; Furkan Burak Mutlu; Dogan Can Cicek; Suleyman Serdar; Kozat

arXiv:2109.11788·cs.LG·May 20, 2022

Parameter-free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

Baturay Saglam, Furkan Burak Mutlu, Dogan Can Cicek, Suleyman Serdar, Kozat

PDF

Open Access 1 Repo

TL;DR

This paper introduces a parameter-free method to reduce estimation bias in deep reinforcement learning, specifically targeting deterministic policy gradients, leading to improved performance on continuous control tasks.

Contribution

A novel, parameter-free Deep Q-learning variant that mitigates underestimation bias by sampling critic weights from a shrunk bias interval, independent of reward variance.

Findings

01

Significantly outperforms existing methods on MuJoCo and Box2D tasks.

02

Reduces underestimation bias effectively in high-variance environments.

03

Enhances state-of-the-art performance in deterministic policy gradient methods.

Abstract

Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baturaysaglam/swtd3
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsTest · Clipped Double Q-learning · Q-Learning · Double Q-learning