Improved Robustness and Safety for Pre-Adaptation of Meta Reinforcement   Learning with Prior Regularization

Lu Wen; Songan Zhang; H. Eric Tseng; Baljeet Singh; Dimitar Filev,; Huei Peng

arXiv:2108.08448·cs.LG·February 10, 2023

Improved Robustness and Safety for Pre-Adaptation of Meta Reinforcement Learning with Prior Regularization

Lu Wen, Songan Zhang, H. Eric Tseng, Baljeet Singh, Dimitar Filev,, Huei Peng

PDF

TL;DR

This paper introduces PEARL$^+$, an enhanced meta-reinforcement learning algorithm that improves prior policy safety and robustness during initial task exposure by incorporating regularization and a new Q-network, validated on safety-critical problems.

Contribution

PEARL$^+$ extends PEARL by adding prior regularization and a new Q-network to enhance safety and robustness in meta-RL for initial task exposure.

Findings

01

Significantly improved prior policy safety.

02

Enhanced robustness to task distribution shifts.

03

Validated on safety-critical robotic and autonomous vehicle tasks.

Abstract

Meta Reinforcement Learning (Meta-RL) has seen substantial advancements recently. In particular, off-policy methods were developed to improve the data efficiency of Meta-RL techniques. \textit{Probabilistic embeddings for actor-critic RL} (PEARL) is a leading approach for multi-MDP adaptation problems. A major drawback of many existing Meta-RL methods, including PEARL, is that they do not explicitly consider the safety of the prior policy when it is exposed to a new task for the first time. Safety is essential for many real-world applications, including field robots and Autonomous Vehicles (AVs). In this paper, we develop the PEARL PLUS (PEARL $^{+}$ ) algorithm, which optimizes the policy for both prior (pre-adaptation) safety and posterior (after-adaptation) performance. Building on top of PEARL, our proposed PEARL $^{+}$ algorithm introduces a prior regularization term in the reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.