Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework

Zongwei Liu; Yonghong Song; Yuanlin Zhang

arXiv:2301.03887·cs.LG·January 11, 2023·1 cites

Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework

Zongwei Liu, Yonghong Song, Yuanlin Zhang

PDF

Open Access

TL;DR

This paper introduces an actor-director-critic framework with an improved double estimator for deep reinforcement learning, enhancing decision-making and convergence speed in various environments.

Contribution

The paper proposes a novel actor-director-critic framework and an improved double estimator method, advancing reinforcement learning performance and stability.

Findings

01

Faster convergence speed in experiments.

02

Higher total return achieved.

03

Improved stability with double estimators.

Abstract

In this paper, we propose actor-director-critic, a new framework for deep reinforcement learning. Compared with the actor-critic framework, the director role is added, and action classification and action evaluation are applied simultaneously to improve the decision-making performance of the agent. Firstly, the actions of the agent are divided into high quality actions and low quality actions according to the rewards returned from the environment. Then, the director network is trained to have the ability to discriminate high and low quality actions and guide the actor network to reduce the repetitive exploration of low quality actions in the early stage of training. In addition, we propose an improved double estimator method to better solve the problem of overestimation in the field of reinforcement learning. For the two critic networks used, we design two target critic networks for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsDense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Target Policy Smoothing · Clipped Double Q-learning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Experience Replay · Twin Delayed Deep Deterministic