A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety
Ankita Kushwaha, Kiran Ravish, Preeti Lamba, and Pawan Kumar

TL;DR
This survey provides a comprehensive, mathematically rigorous overview of Safe Reinforcement Learning and Constrained MDPs, covering theoretical foundations, algorithms, and open research challenges for single and multi-agent systems.
Contribution
It offers a detailed synthesis of SafeRL formulations, algorithms, and open problems, especially emphasizing recent advances in SafeMARL and safety guarantees.
Findings
Summarizes state-of-the-art SafeRL algorithms with safety guarantees.
Reviews theoretical foundations of Constrained Markov Decision Processes.
Proposes five open research problems in SafeRL and SafeMARL.
Abstract
Safe Reinforcement Learning (SafeRL) is the subfield of reinforcement learning that explicitly deals with safety constraints during the learning and deployment of agents. This survey provides a mathematically rigorous overview of SafeRL formulations based on Constrained Markov Decision Processes (CMDPs) and extensions to Multi-Agent Safe RL (SafeMARL). We review theoretical foundations of CMDPs, covering definitions, constrained optimization techniques, and fundamental theorems. We then summarize state-of-the-art algorithms in SafeRL for single agents, including policy gradient methods with safety guarantees and safe exploration strategies, as well as recent advances in SafeMARL for cooperative and competitive settings. Additionally, we propose five open research problems to advance the field, with three focusing on SafeMARL. Each problem is described with motivation, key challenges,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
