TL;DR
This paper explores how entropy regularization in reinforcement learning promotes safety and robustness by encouraging viable actions, and shows that safety constraints can be effectively relaxed through penalties, enabling standard RL methods to achieve robust safety.
Contribution
It reveals the connection between entropy regularization and robustness in constrained RL, proposing a method to approximate safety constraints with penalties for improved resilience.
Findings
Entropy regularization biases policies toward future viable actions.
Relaxing safety constraints with penalties approximates constrained RL with unconstrained RL.
The approach empirically improves robustness to disturbances while maintaining safety and optimality.
Abstract
Despite the many recent advances in reinforcement learning (RL), the question of learning policies that robustly satisfy state constraints under unknown disturbances remains open. In this paper, we offer a new perspective on achieving robust safety by analyzing the interplay between two well-established techniques in model-free RL: entropy regularization, and constraints penalization. We reveal empirically that entropy regularization in constrained RL inherently biases learning toward maximizing the number of future viable actions, thereby promoting constraints satisfaction robust to action noise. Furthermore, we show that by relaxing strict safety constraints through penalties, the constrained RL problem can be approximated arbitrarily closely by an unconstrained one and thus solved using standard model-free RL. This reformulation preserves both safety and optimality while empirically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsEntropy Regularization
