Do No Harm: A Counterfactual Approach to Safe Reinforcement Learning
Sean Vaskov, Wilko Schwarting, Chris L. Baker

TL;DR
This paper introduces a counterfactual constraint method for safe reinforcement learning that penalizes only the harm caused by the policy, improving safety in complex environments.
Contribution
It proposes a novel counterfactual harm-based constraint formulation that ensures safety without overly penalizing agents in unavoidable violation states.
Findings
Agents learn safer policies in simulation environments.
The method outperforms existing constrained RL approaches.
Simulation results demonstrate improved safety and feasibility.
Abstract
Reinforcement Learning (RL) for control has become increasingly popular due to its ability to learn rich feedback policies that take into account uncertainty and complex representations of the environment. When considering safety constraints, constrained optimization approaches, where agents are penalized for constraint violations, are commonly used. In such methods, if agents are initialized in, or must visit, states where constraint violation might be inevitable, it is unclear how much they should be penalized. We address this challenge by formulating a constraint on the counterfactual harm of the learned policy compared to a default, safe policy. In a philosophical sense this formulation only penalizes the learner for constraint violations that it caused; in a practical sense it maintains feasibility of the optimal control problem. We present simulation studies on a rover with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Psychology of Moral and Emotional Judgment
