Recursive Least Squares Advantage Actor-Critic Algorithms

Yuan Wang; Chunyuan Zhang; Tianzong Yu; Meng Ma

arXiv:2201.05918·cs.LG·February 15, 2022·1 cites

Recursive Least Squares Advantage Actor-Critic Algorithms

Yuan Wang, Chunyuan Zhang, Tianzong Yu, Meng Ma

PDF

Open Access

TL;DR

This paper introduces two RLS-based A2C algorithms that enhance sample and computational efficiency in deep reinforcement learning, demonstrating superior performance on Atari and MuJoCo benchmarks.

Contribution

The paper proposes novel RLS-based A2C algorithms, RLSSA2C and RLSNA2C, integrating recursive least squares into deep actor-critic training for improved efficiency.

Findings

01

Both algorithms outperform vanilla A2C in sample efficiency.

02

They achieve higher computational efficiency than state-of-the-art methods.

03

Experimental results confirm effectiveness across diverse environments.

Abstract

As an important algorithm in deep reinforcement learning, advantage actor critic (A2C) has been widely succeeded in both discrete and continuous control tasks with raw pixel inputs, but its sample efficiency still needs to improve more. In traditional reinforcement learning, actor-critic algorithms generally use the recursive least squares (RLS) technology to update the parameter of linear function approximators for accelerating their convergence speed. However, A2C algorithms seldom use this technology to train deep neural networks (DNNs) for improving their sample efficiency. In this paper, we propose two novel RLS-based A2C algorithms and investigate their performance. Both proposed algorithms, called RLSSA2C and RLSNA2C, use the RLS method to train the critic network and the hidden layers of the actor network. The main difference between them is at the policy learning step. RLSSA2C…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control

MethodsA2C