Improving Stochastic Action-Constrained Reinforcement Learning via Truncated Distributions
Roland Stolz, Michael Eichelbeck, Matthias Althoff

TL;DR
This paper introduces efficient numerical methods for accurately estimating key properties of truncated normal distributions in action-constrained reinforcement learning, leading to improved policy performance.
Contribution
It proposes novel numerical approximations and sampling strategies for truncated distributions, enhancing policy updates in constrained RL settings.
Findings
Significant performance improvements on benchmark environments
Accurate estimation of entropy and log-probability is crucial
Efficient sampling reduces computational overhead
Abstract
In reinforcement learning (RL), it is often advantageous to consider additional constraints on the action space to ensure safety or action relevance. Existing work on such action-constrained RL faces challenges regarding effective policy updates, computational efficiency, and predictable runtime. Recent work proposes to use truncated normal distributions for stochastic policy gradient methods. However, the computation of key characteristics, such as the entropy, log-probability, and their gradients, becomes intractable under complex constraints. Hence, prior work approximates these using the non-truncated distributions, which severely degrades performance. We argue that accurate estimation of these characteristics is crucial in the action-constrained RL setting, and propose efficient numerical approximations for them. We also provide an efficient sampling strategy for truncated policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
