Safe Policy Improvement with Soft Baseline Bootstrapping

Kimia Nadjahi; Romain Laroche; R\'emi Tachet des Combes

arXiv:1907.05079·cs.LG·July 12, 2019·1 cites

Safe Policy Improvement with Soft Baseline Bootstrapping

Kimia Nadjahi, Romain Laroche, R\'emi Tachet des Combes

PDF

Open Access 2 Repos

TL;DR

This paper introduces a safer and less conservative policy improvement method in batch reinforcement learning by allowing controlled risk-taking based on local model uncertainty, improving performance guarantees.

Contribution

It extends the SPIBB algorithm with a softer, uncertainty-based policy constraint, enabling broader policy exploration while maintaining safety guarantees.

Findings

01

Significant performance improvements over existing SPI algorithms.

02

Effective in both finite and infinite MDPs with neural network approximation.

03

Provides provable safety guarantees with a less conservative approach.

Abstract

Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy. Safe policy improvement (SPI) provides guarantees with high probability that the trained policy performs better than the behavioural policy, also called baseline in this setting. Previous work shows that the SPI objective improves mean performance as compared to using the basic RL objective, which boils down to solving the MDP with maximum likelihood. Here, we build on that work and improve more precisely the SPI with Baseline Bootstrapping algorithm (SPIBB) by allowing the policy search over a wider set of policies. Instead of binarily classifying the state-action pairs into two sets (the \textit{uncertain} and the \textit{safe-to-train-on} ones), we adopt a softer strategy that controls the error in the value estimates by constraining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Reinforcement Learning in Robotics