Value Activation for Bias Alleviation: Generalized-activated Deep Double   Deterministic Policy Gradients

Jiafei Lyu; Yu Yang; Jiangpeng Yan; Xiu Li

arXiv:2112.11216·cs.LG·December 22, 2021·1 cites

Value Activation for Bias Alleviation: Generalized-activated Deep Double Deterministic Policy Gradients

Jiafei Lyu, Yu Yang, Jiangpeng Yan, Xiu Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces GD3, a novel reinforcement learning algorithm that employs a generalized-activated weighting operator to improve value estimation, effectively reducing bias and enhancing performance in continuous control tasks.

Contribution

It proposes a new bias correction method using activation functions in value estimation, leading to a versatile and effective algorithm for deep reinforcement learning.

Findings

01

GD3 alleviates estimation bias in value functions.

02

Activation functions improve convergence speed and performance.

03

Task-specific activation functions outperform standard baselines.

Abstract

It is vital to accurately estimate the value function in Deep Reinforcement Learning (DRL) such that the agent could execute proper actions instead of suboptimal ones. However, existing actor-critic methods suffer more or less from underestimation bias or overestimation bias, which negatively affect their performance. In this paper, we reveal a simple but effective principle: proper value correction benefits bias alleviation, where we propose the generalized-activated weighting operator that uses any non-decreasing function, namely activation function, as weights for better value estimation. Particularly, we integrate the generalized-activated weighting operator into value estimation and introduce a novel algorithm, Generalized-activated Deep Double Deterministic Policy Gradients (GD3). We theoretically show that GD3 is capable of alleviating the potential estimation bias. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dmksjfl/GD3
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning