Safety-Constrained Policy Transfer with Successor Features
Zeyu Feng, Bowen Zhang, Jianxin Bi, Harold Soh

TL;DR
This paper introduces a method for safe policy transfer in reinforcement learning using successor features and a constrained MDP framework, ensuring safety constraints are met during transfer and outperforming existing methods.
Contribution
It presents a novel extension of generalized policy improvement for constrained settings and a dual optimization algorithm for safe policy transfer using successor features.
Findings
Reduces unsafe state visits in simulations
Outperforms existing safety-aware transfer methods
Effectively separates task goals from safety constraints
Abstract
In this work, we focus on the problem of safe policy transfer in reinforcement learning: we seek to leverage existing policies when learning a new task with specified constraints. This problem is important for safety-critical applications where interactions are costly and unconstrained policies can lead to undesirable or dangerous outcomes, e.g., with physical robots that interact with humans. We propose a Constrained Markov Decision Process (CMDP) formulation that simultaneously enables the transfer of policies and adherence to safety constraints. Our formulation cleanly separates task goals from safety considerations and permits the specification of a wide variety of constraints. Our approach relies on a novel extension of generalized policy improvement to constrained settings via a Lagrangian formulation. We devise a dual optimization algorithm that estimates the optimal dual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Fuel Cells and Related Materials
