Double Actor-Critic with TD Error-Driven Regularization in Reinforcement   Learning

Haohui Chen; Zhiyong Chen; Aoxiang Liu; and Wentuo Fang

arXiv:2409.19231·cs.LG·October 1, 2024

Double Actor-Critic with TD Error-Driven Regularization in Reinforcement Learning

Haohui Chen, Zhiyong Chen, Aoxiang Liu, and Wentuo Fang

PDF

Open Access

TL;DR

This paper introduces TDDR, a novel double actor-critic reinforcement learning algorithm with TD error-driven regularization, achieving superior value estimation without extra hyperparameters and demonstrating strong performance in continuous control tasks.

Contribution

TDDR is the first to combine double actor-critic architecture with TD error-driven regularization without adding hyperparameters, improving value estimation in reinforcement learning.

Findings

01

TDDR outperforms benchmark algorithms in continuous control tasks.

02

TDDR achieves better value estimation compared to classical methods.

03

The algorithm simplifies implementation by avoiding additional hyperparameters.

Abstract

To obtain better value estimation in reinforcement learning, we propose a novel algorithm based on the double actor-critic framework with temporal difference error-driven regularization, abbreviated as TDDR. TDDR employs double actors, with each actor paired with a critic, thereby fully leveraging the advantages of double critics. Additionally, TDDR introduces an innovative critic regularization architecture. Compared to classical deterministic policy gradient-based algorithms that lack a double actor-critic structure, TDDR provides superior estimation. Moreover, unlike existing algorithms with double actor-critic frameworks, TDDR does not introduce any additional hyperparameters, significantly simplifying the design and implementation process. Experiments demonstrate that TDDR exhibits strong competitiveness compared to benchmark algorithms in challenging continuous control tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics