Differentiable Trust Region Layers for Deep Reinforcement Learning
Fabian Otto, Philipp Becker, Ngo Anh Vien, Hanna Carolin Ziesche, and, Gerhard Neumann

TL;DR
This paper introduces differentiable trust region layers for deep reinforcement learning that enforce trust regions via closed-form projections, improving robustness and ease of implementation over existing approximation-based methods.
Contribution
It proposes novel differentiable neural network layers for trust region enforcement in deep RL, formalizing state-specific trust regions with closed-form solutions based on various divergence measures.
Findings
Achieves comparable or better results than existing methods.
Layers are nearly agnostic to implementation choices.
Formalizes trust regions for each state individually.
Abstract
Trust region methods are a popular tool in reinforcement learning as they yield robust policy updates in continuous and discrete action spaces. However, enforcing such trust regions in deep reinforcement learning is difficult. Hence, many approaches, such as Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), are based on approximations. Due to those approximations, they violate the constraints or fail to find the optimal solution within the trust region. Moreover, they are difficult to implement, often lack sufficient exploration, and have been shown to depend on seemingly unrelated implementation choices. In this work, we propose differentiable neural network layers to enforce trust regions for deep Gaussian policies via closed-form projections. Unlike existing methods, those layers formalize trust regions for each state individually and can complement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Memory and Neural Computing
