Recursive Least Squares Advantage Actor-Critic Algorithms
Yuan Wang, Chunyuan Zhang, Tianzong Yu, Meng Ma

TL;DR
This paper introduces two RLS-based A2C algorithms that enhance sample and computational efficiency in deep reinforcement learning, demonstrating superior performance on Atari and MuJoCo benchmarks.
Contribution
The paper proposes novel RLS-based A2C algorithms, RLSSA2C and RLSNA2C, integrating recursive least squares into deep actor-critic training for improved efficiency.
Findings
Both algorithms outperform vanilla A2C in sample efficiency.
They achieve higher computational efficiency than state-of-the-art methods.
Experimental results confirm effectiveness across diverse environments.
Abstract
As an important algorithm in deep reinforcement learning, advantage actor critic (A2C) has been widely succeeded in both discrete and continuous control tasks with raw pixel inputs, but its sample efficiency still needs to improve more. In traditional reinforcement learning, actor-critic algorithms generally use the recursive least squares (RLS) technology to update the parameter of linear function approximators for accelerating their convergence speed. However, A2C algorithms seldom use this technology to train deep neural networks (DNNs) for improving their sample efficiency. In this paper, we propose two novel RLS-based A2C algorithms and investigate their performance. Both proposed algorithms, called RLSSA2C and RLSNA2C, use the RLS method to train the critic network and the hidden layers of the actor network. The main difference between them is at the policy learning step. RLSSA2C…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control
MethodsA2C
