Reinforcement Learning with Almost Sure Constraints
Agustin Castellano, Hancheng Min, Juan Bazerque, Enrique Mallada

TL;DR
This paper introduces a novel approach for finding feasible policies in Constrained Markov Decision Processes using a scalar budget variable, enabling almost sure constraint satisfaction and improving policy search efficiency.
Contribution
It proposes a new class of policies with a budget variable, analyzes the Bellman-like operator for minimal budget computation, and provides learning methods with sample complexity bounds.
Findings
Minimal budget can be computed as the smallest fixed point of a Bellman-like operator.
The approach ensures almost sure constraint satisfaction, unlike expectation-based constraints.
Simulations demonstrate the effectiveness of the method in constrained policy optimization.
Abstract
In this work we address the problem of finding feasible policies for Constrained Markov Decision Processes under probability one constraints. We argue that stationary policies are not sufficient for solving this problem, and that a rich class of policies can be found by endowing the controller with a scalar quantity, so called budget, that tracks how close the agent is to violating the constraint. We show that the minimal budget required to act safely can be obtained as the smallest fixed point of a Bellman-like operator, for which we analyze its convergence properties. We also show how to learn this quantity when the true kernel of the Markov decision process is not known, while providing sample-complexity bounds. The utility of knowing this minimal budget relies in that it can aid in the search of optimal or near-optimal policies by shrinking down the region of the state space the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
