Regularly Updated Deterministic Policy Gradient Algorithm

Shuai Han; Wenbo Zhou; Shuai L\"u; Jiayu Yu

arXiv:2007.00169·cs.LG·July 2, 2020·1 cites

Regularly Updated Deterministic Policy Gradient Algorithm

Shuai Han, Wenbo Zhou, Shuai L\"u, Jiayu Yu

PDF

Open Access

TL;DR

This paper introduces the RUD algorithm, which improves the efficiency and stability of reinforcement learning by better utilizing data and reducing variance in Q-value estimation, outperforming previous methods in Mujoco environments.

Contribution

The paper proposes the RUD algorithm that theoretically and empirically enhances data utilization and Q-value stability in deterministic policy gradient methods.

Findings

01

RUD outperforms DDPG in Mujoco environments.

02

Theoretical proof of improved data usage with RUD.

03

Lower variance in Q-value estimation with RUD.

Abstract

Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most well-known reinforcement learning methods. However, this method is inefficient and unstable in practical applications. On the other hand, the bias and variance of the Q estimation in the target function are sometimes difficult to control. This paper proposes a Regularly Updated Deterministic (RUD) policy gradient algorithm for these problems. This paper theoretically proves that the learning procedure with RUD can make better use of new data in replay buffer than the traditional procedure. In addition, the low variance of the Q value in RUD is more suitable for the current Clipped Double Q-learning strategy. This paper has designed a comparison experiment against previous methods, an ablation experiment with the original DDPG, and other analytical experiments in Mujoco environments. The experimental results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Machine Learning and ELM

MethodsConvolution · Double Q-learning · Experience Replay · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Batch Normalization · Dense Connections · Deep Deterministic Policy Gradient · Q-Learning