Safe Policy Improvement by Minimizing Robust Baseline Regret
Marek Petrik, Yinlam Chow, Mohammad Ghavamzadeh

TL;DR
This paper introduces a robust, model-based method for safe policy improvement in sequential decision-making, which minimizes baseline regret to ensure performance guarantees while effectively leveraging inaccurate system models.
Contribution
It proposes a novel regret-minimization approach for safe policy improvement that accounts for model inaccuracies and provides an approximate algorithm with strong empirical performance.
Findings
The method guarantees performance at least as good as the baseline.
The approximate algorithm significantly outperforms standard approaches.
Empirical results demonstrate effectiveness across multiple domains.
Abstract
An important problem in sequential decision-making under uncertainty is to use limited data to compute a safe policy, i.e., a policy that is guaranteed to perform at least as well as a given baseline strategy. In this paper, we develop and analyze a new model-based approach to compute a safe policy when we have access to an inaccurate dynamics model of the system with known accuracy guarantees. Our proposed robust method uses this (inaccurate) model to directly minimize the (negative) regret w.r.t. the baseline policy. Contrary to the existing approaches, minimizing the regret allows one to improve the baseline policy in states with accurate dynamics and seamlessly fall back to the baseline policy, otherwise. We show that our formulation is NP-hard and propose an approximate algorithm. Our empirical results on several domains show that even this relatively simple approximate algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbabilistic and Robust Engineering Design · Fault Detection and Control Systems · Advanced Control Systems Optimization
