Actor-Critic Reinforcement Learning with Phased Actor

Ruofan Wu; Junmin Zhong; Jennie Si

arXiv:2404.11834·cs.LG·April 19, 2024·1 cites

Actor-Critic Reinforcement Learning with Phased Actor

Ruofan Wu, Junmin Zhong, Jennie Si

PDF

Open Access

TL;DR

This paper introduces PAAC, a novel actor-critic reinforcement learning method that enhances policy gradient estimation, leading to improved control policies with higher robustness, faster learning, and better performance in continuous control tasks.

Contribution

The paper proposes PAAC, a phased actor-critic approach that improves policy gradient estimation, proves convergence and stability, and demonstrates superior performance over existing methods.

Findings

01

PAAC reduces variance in policy gradient estimates.

02

PAAC improves learning speed and robustness.

03

PAAC outperforms baseline algorithms in control tasks.

Abstract

Policy gradient methods in actor-critic reinforcement learning (RL) have become perhaps the most promising approaches to solving continuous optimal control problems. However, the trial-and-error nature of RL and the inherent randomness associated with solution approximations cause variations in the learned optimal values and policies. This has significantly hindered their successful deployment in real life applications where control responses need to meet dynamic performance criteria deterministically. Here we propose a novel phased actor in actor-critic (PAAC) method, aiming at improving policy gradient estimation and thus the quality of the control policy. Specifically, PAAC accounts for both $Q$ value and TD error in its actor update. We prove qualitative properties of PAAC for learning convergence of the value and policy, solution optimality, and stability of system dynamics.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings