DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention
David Mguni, Usman Islam, Yaqi Sun, Xiuling Zhang, Joel Jennings,, Aivar Sootla, Changmin Yu, Ziyan Wang, Jun Wang, Yaodong Yang

TL;DR
DESTA introduces a two-agent framework for safe reinforcement learning that minimizes safety violations while maximizing task rewards, ensuring safer exploration and policy improvement.
Contribution
The paper proposes DESTA, a novel two-player game framework for safe RL that learns to minimize safety violations during training and testing.
Findings
DESTA effectively reduces safety violations in RL tasks.
DESTA improves safety of existing policies in benchmark environments.
DESTA outperforms leading RL methods in safety and performance.
Abstract
Reinforcement learning (RL) involves performing exploratory actions in an unknown system. This can place a learning agent in dangerous and potentially catastrophic system states. Current approaches for tackling safe learning in RL simultaneously trade-off safe exploration and task fulfillment. In this paper, we introduce a new generation of RL solvers that learn to minimise safety violations while maximising the task reward to the extent that can be tolerated by the safe policy. Our approach introduces a novel two-player framework for safe RL called Distributive Exploration Safety Training Algorithm (DESTA). The core of DESTA is a game between two adaptive agents: Safety Agent that is delegated the task of minimising safety violations and Task Agent whose goal is to maximise the environment reward. Specifically, Safety Agent can selectively take control of the system at any given point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Optimization and Search Problems
