Do No Harm: A Counterfactual Approach to Safe Reinforcement Learning

Sean Vaskov; Wilko Schwarting; Chris L. Baker

arXiv:2405.11669·cs.LG·May 21, 2024

Do No Harm: A Counterfactual Approach to Safe Reinforcement Learning

Sean Vaskov, Wilko Schwarting, Chris L. Baker

PDF

Open Access

TL;DR

This paper introduces a counterfactual constraint method for safe reinforcement learning that penalizes only the harm caused by the policy, improving safety in complex environments.

Contribution

It proposes a novel counterfactual harm-based constraint formulation that ensures safety without overly penalizing agents in unavoidable violation states.

Findings

01

Agents learn safer policies in simulation environments.

02

The method outperforms existing constrained RL approaches.

03

Simulation results demonstrate improved safety and feasibility.

Abstract

Reinforcement Learning (RL) for control has become increasingly popular due to its ability to learn rich feedback policies that take into account uncertainty and complex representations of the environment. When considering safety constraints, constrained optimization approaches, where agents are penalized for constraint violations, are commonly used. In such methods, if agents are initialized in, or must visit, states where constraint violation might be inevitable, it is unclear how much they should be penalized. We address this challenge by formulating a constraint on the counterfactual harm of the learned policy compared to a default, safe policy. In a philosophical sense this formulation only penalizes the learner for constraint violations that it caused; in a practical sense it maintains feasibility of the optimal control problem. We present simulation studies on a rover with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Psychology of Moral and Emotional Judgment