Algorithms for Deciding the Safety of States in Fully Observable Non-deterministic Problems: Technical Report
Johannes Schmalz, Chaahat Jain

TL;DR
This paper introduces a new policy-iteration algorithm, iPI, that guarantees polynomial worst-case runtime for safety decision problems in non-deterministic environments, outperforming existing methods in complex scenarios.
Contribution
The paper presents iPI, a novel policy-iteration algorithm that combines the efficiency of TarjanSafe with guaranteed polynomial worst-case runtime, improving safety verification in non-deterministic problems.
Findings
iPI matches TarjanSafe's best-case performance
iPI scales exponentially better on complex problems
Experimental results confirm theoretical advantages
Abstract
Learned action policies are increasingly popular in sequential decision-making, but suffer from a lack of safety guarantees. Recent work introduced a pipeline for testing the safety of such policies under initial-state and action-outcome non-determinism. At the pipeline's core, is the problem of deciding whether a state is safe (a safe policy exists from the state) and finding faults, which are state-action pairs that transition from a safe state to an unsafe one. Their most effective algorithm for deciding safety, TarjanSafe, is effective on their benchmarks, but we show that it has exponential worst-case runtime with respect to the state space. A linear-time alternative exists, but it is slower in practice. We close this gap with a new policy-iteration algorithm iPI, that combines the best of both: it matches TarjanSafe's best-case runtime while guaranteeing a polynomial worst-case.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Adversarial Robustness in Machine Learning
