A CMDP-within-online framework for Meta-Safe Reinforcement Learning

Vanshaj Khattar; Yuhao Ding; Bilgehan Sel; Javad Lavaei; Ming Jin

arXiv:2405.16601·cs.LG·May 28, 2024·2 cites

A CMDP-within-online framework for Meta-Safe Reinforcement Learning

Vanshaj Khattar, Yuhao Ding, Bilgehan Sel, Javad Lavaei, Ming Jin

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel meta-safe reinforcement learning framework based on CMDP-within-online methods, providing provable guarantees on reward maximization and constraint satisfaction in both static and dynamic environments.

Contribution

It develops the first provable meta-safe RL framework using CMDP-within-online approach, incorporating task-similarity, adaptive learning rates, and off-policy corrections for practical effectiveness.

Findings

01

Task-averaged regret bounds established for reward and constraint violations.

02

Improved constraint satisfaction with increased task similarity or relatedness.

03

Experimental results demonstrate the effectiveness of the proposed approach.

Abstract

Meta-reinforcement learning has widely been used as a learning-to-learn framework to solve unseen tasks with limited experience. However, the aspect of constraint violations has not been adequately addressed in the existing works, making their application restricted in real-world settings. In this paper, we study the problem of meta-safe reinforcement learning (Meta-SRL) through the CMDP-within-online framework to establish the first provable guarantees in this important setting. We obtain task-averaged regret bounds for the reward maximization (optimality gap) and constraint violations using gradient-based meta-learning and show that the task-averaged optimality gap and constraint satisfaction improve with task-similarity in a static environment or task-relatedness in a dynamic environment. Several technical challenges arise when making this framework practical. To this end, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A CMDP-within-online framework for Meta-Safe Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics