Efficient Continuous Control with Double Actors and Regularized Critics

Jiafei Lyu; Xiaoteng Ma; Jiangpeng Yan; Xiu Li

arXiv:2106.03050·cs.LG·June 8, 2021

Efficient Continuous Control with Double Actors and Regularized Critics

Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Xiu Li

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces the DARC algorithm, leveraging double actors and regularized critics to improve value estimation, exploration, and sample efficiency in continuous control reinforcement learning tasks.

Contribution

It explores the use of double actors for bias reduction and introduces a regularization technique to enhance critic stability, advancing continuous RL methods.

Findings

01

DARC outperforms state-of-the-art methods on continuous control tasks.

02

Double actors reduce over- and underestimation biases.

03

Regularization improves critic stability and sample efficiency.

Abstract

How to obtain good value estimation is one of the key problems in Reinforcement Learning (RL). Current value estimation methods, such as DDPG and TD3, suffer from unnecessary over- or underestimation bias. In this paper, we explore the potential of double actors, which has been neglected for a long time, for better value function estimation in continuous setting. First, we uncover and demonstrate the bias alleviation property of double actors by building double actors upon single critic and double critics to handle overestimation bias in DDPG and underestimation bias in TD3 respectively. Next, we interestingly find that double actors help improve the exploration ability of the agent. Finally, to mitigate the uncertainty of value estimate from double critics, we further propose to regularize the critic networks under double actors architecture, which gives rise to Double Actors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dmksjfl/DARC
pytorchOfficial

Videos

Efficient Continuous Control with Double Actors and Regularized Critics· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Neural Networks and Reservoir Computing · Adaptive Dynamic Programming Control

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Weight Decay · Batch Normalization · Convolution · Clipped Double Q-learning · Experience Replay · Target Policy Smoothing · Dense Connections · Deep Deterministic Policy Gradient · Adam