Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones

Brijen Thananjeyan; Ashwin Balakrishna; Suraj Nair; Michael Luo,; Krishnan Srinivasan; Minho Hwang; Joseph E. Gonzalez; Julian Ibarz; Chelsea; Finn; Ken Goldberg

arXiv:2010.15920·cs.LG·May 19, 2021·20 cites

Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones

Brijen Thananjeyan, Ashwin Balakrishna, Suraj Nair, Michael Luo,, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea, Finn, Ken Goldberg

PDF

Open Access 2 Repos

TL;DR

Recovery RL introduces a novel approach that uses offline data to learn safety zones and separates task and safety policies, significantly improving safety and efficiency in reinforcement learning tasks.

Contribution

The paper presents Recovery RL, a new safe RL algorithm that learns constraint zones offline and separates task and recovery policies for better safety and performance.

Findings

01

Outperforms five prior safe RL methods across six domains.

02

Trades off constraint violations and task success 2-20 times more efficiently in simulation.

03

Achieves three times more efficient safety in physical robot experiments.

Abstract

Safety remains a central obstacle preventing widespread use of RL in the real world: learning new tasks in uncertain environments requires extensive exploration, but safety requires limiting exploration. We propose Recovery RL, an algorithm which navigates this tradeoff by (1) leveraging offline data to learn about constraint violating zones before policy learning and (2) separating the goals of improving task performance and constraint satisfaction across two policies: a task policy that only optimizes the task reward and a recovery policy that guides the agent to safety when constraint violation is likely. We evaluate Recovery RL on 6 simulation domains, including two contact-rich manipulation tasks and an image-based navigation task, and an image-based obstacle avoidance task on a physical robot. We compare Recovery RL to 5 prior safe RL methods which jointly optimize for task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Autonomous Vehicle Technology and Safety