A CMDP-within-online framework for Meta-Safe Reinforcement Learning
Vanshaj Khattar, Yuhao Ding, Bilgehan Sel, Javad Lavaei, Ming Jin

TL;DR
This paper introduces a novel meta-safe reinforcement learning framework based on CMDP-within-online methods, providing provable guarantees on reward maximization and constraint satisfaction in both static and dynamic environments.
Contribution
It develops the first provable meta-safe RL framework using CMDP-within-online approach, incorporating task-similarity, adaptive learning rates, and off-policy corrections for practical effectiveness.
Findings
Task-averaged regret bounds established for reward and constraint violations.
Improved constraint satisfaction with increased task similarity or relatedness.
Experimental results demonstrate the effectiveness of the proposed approach.
Abstract
Meta-reinforcement learning has widely been used as a learning-to-learn framework to solve unseen tasks with limited experience. However, the aspect of constraint violations has not been adequately addressed in the existing works, making their application restricted in real-world settings. In this paper, we study the problem of meta-safe reinforcement learning (Meta-SRL) through the CMDP-within-online framework to establish the first provable guarantees in this important setting. We obtain task-averaged regret bounds for the reward maximization (optimality gap) and constraint violations using gradient-based meta-learning and show that the task-averaged optimality gap and constraint satisfaction improve with task-similarity in a static environment or task-relatedness in a dynamic environment. Several technical challenges arise when making this framework practical. To this end, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
