Constrained Policy Gradient Method for Safe and Fast Reinforcement Learning: a Neural Tangent Kernel Based Approach
Bal\'azs Varga, Bal\'azs Kulcs\'ar, Morteza Haghir Chehreghani

TL;DR
This paper introduces a neural tangent kernel-based constrained policy gradient method for safe and accelerated reinforcement learning, demonstrating improved learning speed and transparency in simulation environments.
Contribution
It presents a novel practical application of neural tangent kernels in reinforcement learning with constraints for safety and efficiency.
Findings
Constraints improve learning speed and transparency.
Method effective in Cartpole and Lunar Lander environments.
Neural tangent kernel enables policy evaluation at arbitrary states.
Abstract
This paper presents a constrained policy gradient algorithm. We introduce constraints for safe learning with the following steps. First, learning is slowed down (lazy learning) so that the episodic policy change can be computed with the help of the policy gradient theorem and the neural tangent kernel. Then, this enables us the evaluation of the policy at arbitrary states too. In the same spirit, learning can be guided, ensuring safety via augmenting episode batches with states where the desired action probabilities are prescribed. Finally, exogenous discounted sum of future rewards (returns) can be computed at these specific state-action pairs such that the policy network satisfies constraints. Computing the returns is based on solving a system of linear equations (equality constraints) or a constrained quadratic program (inequality constraints, regional constraints). Simulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
