Efficient Continuous Control with Double Actors and Regularized Critics
Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Xiu Li

TL;DR
This paper introduces the DARC algorithm, leveraging double actors and regularized critics to improve value estimation, exploration, and sample efficiency in continuous control reinforcement learning tasks.
Contribution
It explores the use of double actors for bias reduction and introduces a regularization technique to enhance critic stability, advancing continuous RL methods.
Findings
DARC outperforms state-of-the-art methods on continuous control tasks.
Double actors reduce over- and underestimation biases.
Regularization improves critic stability and sample efficiency.
Abstract
How to obtain good value estimation is one of the key problems in Reinforcement Learning (RL). Current value estimation methods, such as DDPG and TD3, suffer from unnecessary over- or underestimation bias. In this paper, we explore the potential of double actors, which has been neglected for a long time, for better value function estimation in continuous setting. First, we uncover and demonstrate the bias alleviation property of double actors by building double actors upon single critic and double critics to handle overestimation bias in DDPG and underestimation bias in TD3 respectively. Next, we interestingly find that double actors help improve the exploration ability of the agent. Finally, to mitigate the uncertainty of value estimate from double critics, we further propose to regularize the critic networks under double actors architecture, which gives rise to Double Actors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Neural Networks and Reservoir Computing · Adaptive Dynamic Programming Control
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Weight Decay · Batch Normalization · Convolution · Clipped Double Q-learning · Experience Replay · Target Policy Smoothing · Dense Connections · Deep Deterministic Policy Gradient · Adam
