Constrained Policy Gradient Method for Safe and Fast Reinforcement   Learning: a Neural Tangent Kernel Based Approach

Bal\'azs Varga; Bal\'azs Kulcs\'ar; Morteza Haghir Chehreghani

arXiv:2107.09139·cs.LG·January 24, 2022

Constrained Policy Gradient Method for Safe and Fast Reinforcement Learning: a Neural Tangent Kernel Based Approach

Bal\'azs Varga, Bal\'azs Kulcs\'ar, Morteza Haghir Chehreghani

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neural tangent kernel-based constrained policy gradient method for safe and accelerated reinforcement learning, demonstrating improved learning speed and transparency in simulation environments.

Contribution

It presents a novel practical application of neural tangent kernels in reinforcement learning with constraints for safety and efficiency.

Findings

01

Constraints improve learning speed and transparency.

02

Method effective in Cartpole and Lunar Lander environments.

03

Neural tangent kernel enables policy evaluation at arbitrary states.

Abstract

This paper presents a constrained policy gradient algorithm. We introduce constraints for safe learning with the following steps. First, learning is slowed down (lazy learning) so that the episodic policy change can be computed with the help of the policy gradient theorem and the neural tangent kernel. Then, this enables us the evaluation of the policy at arbitrary states too. In the same spirit, learning can be guided, ensuring safety via augmenting episode batches with states where the desired action probabilities are prescribed. Finally, exogenous discounted sum of future rewards (returns) can be computed at these specific state-action pairs such that the policy network satisfies constraints. Computing the returns is based on solving a system of linear equations (equality constraints) or a constrained quadratic program (inequality constraints, regional constraints). Simulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bva-bme/Constrained_Policy_Gradient
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications