Safe Policy Improvement with Baseline Bootstrapping

Romain Laroche; Paul Trichelair; R\'emi Tachet des Combes

arXiv:1712.06924·cs.LG·June 11, 2019·82 cites

Safe Policy Improvement with Baseline Bootstrapping

Romain Laroche, Paul Trichelair, R\'emi Tachet des Combes

PDF

Open Access 2 Repos

TL;DR

This paper introduces SPIBB, a safe policy improvement method in batch reinforcement learning that guarantees baseline performance, utilizing a bootstrapping approach based on uncertainty, and demonstrates its effectiveness in various domains including deep RL.

Contribution

The paper proposes SPIBB, a novel safe policy improvement algorithm with theoretical guarantees, practical variants, and a model-free deep RL implementation that outperforms existing methods in safety and performance.

Findings

01

SPIBB guarantees baseline performance in batch RL.

02

SPIBB outperforms existing algorithms in safety and mean performance.

03

Deep RL version SPIBB-DQN trains efficiently without environment interaction.

Abstract

This paper considers Safe Policy Improvement (SPI) in Batch Reinforcement Learning (Batch RL): from a fixed dataset and without direct access to the true environment, train a policy that is guaranteed to perform at least as well as the baseline policy used to collect the data. Our approach, called SPI with Baseline Bootstrapping (SPIBB), is inspired by the knows-what-it-knows paradigm: it bootstraps the trained policy with the baseline when the uncertainty is high. Our first algorithm, $Π_{b}$ -SPIBB, comes with SPI theoretical guarantees. We also implement a variant, $Π_{\leq b}$ -SPIBB, that is even more efficient in practice. We apply our algorithms to a motivational stochastic gridworld domain and further demonstrate on randomly generated MDPs the superiority of SPIBB with respect to existing algorithms, not only in safety but also in mean performance. Finally, we implement a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security