Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Raghuram Bharadwaj Diddigi; Prateek Jain; Prabuchandran K.J.; Shalabh; Bhatnagar

arXiv:2110.10017·cs.LG·June 16, 2022

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Raghuram Bharadwaj Diddigi, Prateek Jain, Prabuchandran K.J., Shalabh, Bhatnagar

PDF

Open Access

TL;DR

This paper introduces a novel off-policy natural actor-critic algorithm that uses neural networks and state-action distribution correction, improving convergence and performance in reinforcement learning tasks.

Contribution

It proposes a neural network-compatible off-policy natural actor-critic method with convergence guarantees using compatible features.

Findings

01

Outperforms vanilla gradient actor-critic on benchmark tasks

02

Guarantees convergence to a local optimum with neural network function approximation

03

Enables flexible policy and value function approximation with neural networks

Abstract

Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data obtained from the given policy (known as the behavior policy). As the optimal policy can be very different from the behavior policy, learning optimal behavior is very hard in the "off-policy" setting compared to the "on-policy" setting where new data from the policy updates will be utilized in learning. This work proposes an off-policy natural actor-critic algorithm that utilizes state-action distribution correction for handling the off-policy behavior and the natural policy gradient for sample efficiency. The existing natural gradient-based actor-critic algorithms with convergence guarantees require fixed features for approximating both policy and value…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics