Safe Policy Improvement Approaches and their Limitations
Philipp Scholl, Felix Dietrich, Clemens Otte, Steffen Udluft

TL;DR
This paper critically examines Safe Policy Improvement algorithms in offline reinforcement learning, revealing limitations in existing methods and proposing new algorithms with provable safety guarantees, supported by extensive experiments.
Contribution
It identifies flaws in Soft-SPIBB safety claims, introduces Adv-Soft-SPIBB algorithms with proven safety, and demonstrates the practical limitations of safety bounds in real data scenarios.
Findings
Soft-SPIBB safety claims are invalid.
Adv-Soft-SPIBB algorithms are provably safe.
Heuristic Lower-Approx-Soft-SPIBB performs best in experiments.
Abstract
Safe Policy Improvement (SPI) is an important technique for offline reinforcement learning in safety critical applications as it improves the behavior policy with a high probability. We classify various SPI approaches from the literature into two groups, based on how they utilize the uncertainty of state-action pairs. Focusing on the Soft-SPIBB (Safe Policy Improvement with Soft Baseline Bootstrapping) algorithms, we show that their claim of being provably safe does not hold. Based on this finding, we develop adaptations, the Adv-Soft-SPIBB algorithms, and show that they are provably safe. A heuristic adaptation, Lower-Approx-Soft-SPIBB, yields the best performance among all SPIBB algorithms in extensive experiments on two benchmarks. We also check the safety guarantees of the provably safe algorithms and show that huge amounts of data are necessary such that the safety bounds become…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Software Reliability and Analysis Research · Formal Methods in Verification
