Improved Robustness and Safety for Pre-Adaptation of Meta Reinforcement Learning with Prior Regularization
Lu Wen, Songan Zhang, H. Eric Tseng, Baljeet Singh, Dimitar Filev,, Huei Peng

TL;DR
This paper introduces PEARL$^+$, an enhanced meta-reinforcement learning algorithm that improves prior policy safety and robustness during initial task exposure by incorporating regularization and a new Q-network, validated on safety-critical problems.
Contribution
PEARL$^+$ extends PEARL by adding prior regularization and a new Q-network to enhance safety and robustness in meta-RL for initial task exposure.
Findings
Significantly improved prior policy safety.
Enhanced robustness to task distribution shifts.
Validated on safety-critical robotic and autonomous vehicle tasks.
Abstract
Meta Reinforcement Learning (Meta-RL) has seen substantial advancements recently. In particular, off-policy methods were developed to improve the data efficiency of Meta-RL techniques. \textit{Probabilistic embeddings for actor-critic RL} (PEARL) is a leading approach for multi-MDP adaptation problems. A major drawback of many existing Meta-RL methods, including PEARL, is that they do not explicitly consider the safety of the prior policy when it is exposed to a new task for the first time. Safety is essential for many real-world applications, including field robots and Autonomous Vehicles (AVs). In this paper, we develop the PEARL PLUS (PEARL) algorithm, which optimizes the policy for both prior (pre-adaptation) safety and posterior (after-adaptation) performance. Building on top of PEARL, our proposed PEARL algorithm introduces a prior regularization term in the reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
