Safety-Constrained Policy Transfer with Successor Features

Zeyu Feng; Bowen Zhang; Jianxin Bi; Harold Soh

arXiv:2211.05361·cs.LG·November 11, 2022

Safety-Constrained Policy Transfer with Successor Features

Zeyu Feng, Bowen Zhang, Jianxin Bi, Harold Soh

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method for safe policy transfer in reinforcement learning using successor features and a constrained MDP framework, ensuring safety constraints are met during transfer and outperforming existing methods.

Contribution

It presents a novel extension of generalized policy improvement for constrained settings and a dual optimization algorithm for safe policy transfer using successor features.

Findings

01

Reduces unsafe state visits in simulations

02

Outperforms existing safety-aware transfer methods

03

Effectively separates task goals from safety constraints

Abstract

In this work, we focus on the problem of safe policy transfer in reinforcement learning: we seek to leverage existing policies when learning a new task with specified constraints. This problem is important for safety-critical applications where interactions are costly and unconstrained policies can lead to undesirable or dangerous outcomes, e.g., with physical robots that interact with humans. We propose a Constrained Markov Decision Process (CMDP) formulation that simultaneously enables the transfer of policies and adherence to safety constraints. Our formulation cleanly separates task goals from safety considerations and permits the specification of a wide variety of constraints. Our approach relies on a novel extension of generalized policy improvement to constrained settings via a Lagrangian formulation. We devise a dual optimization algorithm that estimates the optimal dual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clear-nus/SFT-CoP
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Fuel Cells and Related Materials