Algorithms for Deciding the Safety of States in Fully Observable Non-deterministic Problems: Technical Report

Johannes Schmalz; Chaahat Jain

arXiv:2603.15282·cs.AI·March 17, 2026

Algorithms for Deciding the Safety of States in Fully Observable Non-deterministic Problems: Technical Report

Johannes Schmalz, Chaahat Jain

PDF

Open Access

TL;DR

This paper introduces a new policy-iteration algorithm, iPI, that guarantees polynomial worst-case runtime for safety decision problems in non-deterministic environments, outperforming existing methods in complex scenarios.

Contribution

The paper presents iPI, a novel policy-iteration algorithm that combines the efficiency of TarjanSafe with guaranteed polynomial worst-case runtime, improving safety verification in non-deterministic problems.

Findings

01

iPI matches TarjanSafe's best-case performance

02

iPI scales exponentially better on complex problems

03

Experimental results confirm theoretical advantages

Abstract

Learned action policies are increasingly popular in sequential decision-making, but suffer from a lack of safety guarantees. Recent work introduced a pipeline for testing the safety of such policies under initial-state and action-outcome non-determinism. At the pipeline's core, is the problem of deciding whether a state is safe (a safe policy exists from the state) and finding faults, which are state-action pairs that transition from a safe state to an unsafe one. Their most effective algorithm for deciding safety, TarjanSafe, is effective on their benchmarks, but we show that it has exponential worst-case runtime with respect to the state space. A linear-time alternative exists, but it is slower in practice. We close this gap with a new policy-iteration algorithm iPI, that combines the best of both: it matches TarjanSafe's best-case runtime while guaranteeing a polynomial worst-case.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Adversarial Robustness in Machine Learning