Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety   Constraints in Finite MDPs

Harsh Satija; Philip S. Thomas; Joelle Pineau; Romain Laroche

arXiv:2106.00099·cs.LG·November 1, 2021·5 cites

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

Harsh Satija, Philip S. Thomas, Joelle Pineau, Romain Laroche

PDF

Open Access 1 Video

TL;DR

This paper introduces a multi-objective safe policy improvement method in offline RL that guarantees performance bounds while balancing multiple reward signals, demonstrated on synthetic and real-world healthcare tasks.

Contribution

It extends SPIBB to handle multiple objectives with user preferences, providing high-probability safety guarantees in finite MDPs for offline RL.

Findings

01

Effective in synthetic grid-world safety task

02

Successful application in critical care for sepsis treatment

03

Provides performance guarantees with multiple objectives

Abstract

We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting. We consider the scenario where: (i) we have a dataset collected under a known baseline policy, (ii) multiple reward signals are received from the environment inducing as many objectives to optimize. We present an SPI formulation for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals while ensuring that the new policy performs at least as well as the baseline policy along each individual objective. We build on traditional SPI algorithms and propose a novel method based on Safe Policy Iteration with Baseline Bootstrapping (SPIBB, Laroche et al., 2019) that provides high probability guarantees on the performance of the agent in the true environment. We show the effectiveness of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs· slideslive

Taxonomy

TopicsSepsis Diagnosis and Treatment · Cardiac Arrest and Resuscitation · Respiratory Support and Mechanisms