Leveraging Constraint Violation Signals For Action-Constrained Reinforcement Learning
Janaka Chathuranga Brahmanage, Jiajing Ling, Akshat Kumar

TL;DR
This paper introduces a novel approach for action-constrained reinforcement learning that uses constraint violation signals to train normalizing flows, reducing violations and improving efficiency compared to previous methods.
Contribution
The paper proposes a new method that trains normalizing flows with constraint violation signals, avoiding the need for generating feasible actions and extending to state-wise constraints.
Findings
Significantly fewer constraint violations in control tasks.
Achieves comparable or better control performance.
Simplifies learning by eliminating the need for feasible action samples.
Abstract
In many RL applications, ensuring an agent's actions adhere to constraints is crucial for safety. Most previous methods in Action-Constrained Reinforcement Learning (ACRL) employ a projection layer after the policy network to correct the action. However projection-based methods suffer from issues like the zero gradient problem and higher runtime due to the usage of optimization solvers. Recently methods were proposed to train generative models to learn a differentiable mapping between latent variables and feasible actions to address this issue. However, generative models require training using samples from the constrained action space, which itself is challenging. To address such limitations, first, we define a target distribution for feasible actions based on constraint violation signals, and train normalizing flows by minimizing the KL divergence between an approximated distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsNormalizing Flows
