Solving Richly Constrained Reinforcement Learning through State Augmentation and Reward Penalties
Hao Jiang, Tien Mai, Pradeep Varakantham, Minh Huy Hoang

TL;DR
This paper introduces an augmented state and reward penalty approach to solve constrained reinforcement learning problems, effectively handling expected cost constraints and outperforming existing methods on benchmarks.
Contribution
It presents a novel unconstrained formulation with state augmentation and reward penalties, offering a new paradigm for constrained RL solutions.
Findings
Outperforms leading approaches on benchmark problems
Provides a general and theoretically sound formulation
Effectively manages expected cost constraints
Abstract
Constrained Reinforcement Learning has been employed to enforce safety constraints on policy through the use of expected cost constraints. The key challenge is in handling expected cost accumulated using the policy and not just in a single step. Existing methods have developed innovative ways of converting this cost constraint over entire policy to constraints over local decisions (at each time step). While such approaches have provided good solutions with regards to objective, they can either be overly aggressive or conservative with respect to costs. This is owing to use of estimates for "future" or "backward" costs in local cost constraints. To that end, we provide an equivalent unconstrained formulation to constrained RL that has an augmented state space and reward penalties. This intuitive formulation is general and has interesting theoretical properties. More importantly, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSafety Systems Engineering in Autonomy · Software Reliability and Analysis Research · Occupational Health and Safety Research
