MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance

Michael Luo; Ashwin Balakrishna; Brijen Thananjeyan; Suraj Nair,; Julian Ibarz; Jie Tan; Chelsea Finn; Ion Stoica; Ken Goldberg

arXiv:2112.03575·cs.LG·December 8, 2021·5 cites

MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance

Michael Luo, Ashwin Balakrishna, Brijen Thananjeyan, Suraj Nair,, Julian Ibarz, Jie Tan, Chelsea Finn, Ion Stoica, Ken Goldberg

PDF

Open Access

TL;DR

This paper introduces MESA, a meta-learning approach that leverages offline data to quickly adapt risk measures for safe reinforcement learning, significantly reducing constraint violations in new environments.

Contribution

MESA is the first method to meta-learn risk measures for safe RL using offline data, enabling rapid adaptation and improved safety in unseen environments.

Findings

01

MESA reduces constraint violations by up to 50% in new environments.

02

It maintains task performance while improving safety.

03

Effective across multiple continuous control domains.

Abstract

Safe exploration is critical for using reinforcement learning (RL) in risk-sensitive environments. Recent work learns risk measures which measure the probability of violating constraints, which can then be used to enable safety. However, learning such risk measures requires significant interaction with the environment, resulting in excessive constraint violations during learning. Furthermore, these measures are not easily transferable to new environments. We cast safe exploration as an offline meta-RL problem, where the objective is to leverage examples of safe and unsafe behavior across a range of environments to quickly adapt learned risk measures to a new environment with previously unseen dynamics. We then propose MEta-learning for Safe Adaptation (MESA), an approach for meta-learning a risk measure for safe RL. Simulation experiments across 5 continuous control domains suggest that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Occupational Health and Safety Research