TL;DR
This paper introduces IR^2L, a reinforcement learning method that uses a pre-trained instinct network to override unsafe actions, significantly reducing safety violations during training in safety-critical environments.
Contribution
The paper proposes a transferable instinct network that enhances safety in reinforcement learning by overriding unsafe actions, trained on a single task and applied to new environments.
Findings
IR^2L reduces safety violations compared to baseline RL.
IR^2L achieves similar task performance as baseline.
Pre-trained instinct network effectively transfers to new environments.
Abstract
Random exploration is one of the main mechanisms through which reinforcement learning (RL) finds well-performing policies. However, it can lead to undesirable or catastrophic outcomes when learning online in safety-critical environments. In fact, safe learning is one of the major obstacles towards real-world agents that can learn during deployment. One way of ensuring that agents respect hard limitations is to explicitly configure boundaries in which they can operate. While this might work in some cases, we do not always have clear a-priori information which states and actions can lead dangerously close to hazardous states. Here, we present an approach where an additional policy can override the main policy and offer a safer alternative action. In our instinct-regulated RL (IR^2L) approach, an "instinctual" network is trained to recognize undesirable situations, while guarding the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
