Probabilistic Safety Guarantee for Stochastic Control Systems Using Average Reward MDPs
Saber Omidi, Marek Petrik, Se Young Yoon, and Momotaz Begum

TL;DR
This paper introduces a novel algorithm that leverages average reward MDPs to compute safe policies for stochastic control systems, ensuring high-confidence safety constraints despite system uncertainties.
Contribution
The paper presents a new approach that reduces safety verification to average reward MDPs, enabling efficient computation of safe policies using linear programming techniques.
Findings
The method provides more comprehensive safety analysis.
It converges faster than discounted reward approaches.
It yields higher quality safety policies.
Abstract
Safety in stochastic control systems, which are subject to random noise with a known probability distribution, aims to compute policies that satisfy predefined operational constraints with high confidence throughout the uncertain evolution of the state variables. The unpredictable evolution of state variables poses a significant challenge for meeting predefined constraints using various control methods. To address this, we present a new algorithm that computes safe policies to determine the safety level across a finite state set. This algorithm reduces the safety objective to the standard average reward Markov Decision Process (MDP) objective. This reduction enables us to use standard techniques, such as linear programs, to compute and analyze safe policies. We validate the proposed method numerically on the Double Integrator and the Inverted Pendulum systems. Results indicate that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Formal Methods in Verification
