Safe Policy Improvement by Minimizing Robust Baseline Regret

Marek Petrik; Yinlam Chow; Mohammad Ghavamzadeh

arXiv:1607.03842·stat.ML·July 14, 2016·NeurIPS·68 cites

Safe Policy Improvement by Minimizing Robust Baseline Regret

Marek Petrik, Yinlam Chow, Mohammad Ghavamzadeh

PDF

Open Access

TL;DR

This paper introduces a robust, model-based method for safe policy improvement in sequential decision-making, which minimizes baseline regret to ensure performance guarantees while effectively leveraging inaccurate system models.

Contribution

It proposes a novel regret-minimization approach for safe policy improvement that accounts for model inaccuracies and provides an approximate algorithm with strong empirical performance.

Findings

01

The method guarantees performance at least as good as the baseline.

02

The approximate algorithm significantly outperforms standard approaches.

03

Empirical results demonstrate effectiveness across multiple domains.

Abstract

An important problem in sequential decision-making under uncertainty is to use limited data to compute a safe policy, i.e., a policy that is guaranteed to perform at least as well as a given baseline strategy. In this paper, we develop and analyze a new model-based approach to compute a safe policy when we have access to an inaccurate dynamics model of the system with known accuracy guarantees. Our proposed robust method uses this (inaccurate) model to directly minimize the (negative) regret w.r.t. the baseline policy. Contrary to the existing approaches, minimizing the regret allows one to improve the baseline policy in states with accurate dynamics and seamlessly fall back to the baseline policy, otherwise. We show that our formulation is NP-hard and propose an approximate algorithm. Our empirical results on several domains show that even this relatively simple approximate algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProbabilistic and Robust Engineering Design · Fault Detection and Control Systems · Advanced Control Systems Optimization