Safety Representations for Safer Policy Learning

Kaustubh Mani; Vincent Mai; Charlie Gauthier; Annie Chen; Samer; Nashed; Liam Paull

arXiv:2502.20341·cs.LG·February 28, 2025

Safety Representations for Safer Policy Learning

Kaustubh Mani, Vincent Mai, Charlie Gauthier, Annie Chen, Samer, Nashed, Liam Paull

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces a novel method for reinforcement learning that learns safety representations to enable safer and more efficient exploration in safety-critical environments, outperforming existing constrained exploration techniques.

Contribution

We propose a method that explicitly learns state-conditioned safety representations to improve safe exploration without excessive caution, enhancing policy learning efficiency.

Findings

01

Significantly improves task performance in safety-critical environments

02

Reduces constraint violations during training

03

Balances exploration and safety effectively

Abstract

Reinforcement learning algorithms typically necessitate extensive exploration of the state space to find optimal policies. However, in safety-critical applications, the risks associated with such exploration can lead to catastrophic consequences. Existing safe exploration methods attempt to mitigate this by imposing constraints, which often result in overly conservative behaviours and inefficient learning. Heavy penalties for early constraint violations can trap agents in local optima, deterring exploration of risky yet high-reward regions of the state space. To address this, we introduce a method that explicitly learns state-conditioned safety representations. By augmenting the state features with these safety representations, our approach naturally encourages safer exploration without being excessively cautious, resulting in more efficient and safer policy learning in safety-critical…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The idea of state-conditioned risk representations is intuitive and reasonable, and is a novel approach in the community of safety RL. 2. The method is sound and shows significant improvements in multiple environments, and the transferability of the risk models are also tested. 3. The structure of the paper is clear and the writing is easy to follow.

Weaknesses

1. The related works in the recent two years seem to be missing, and the latest baselines that the author compared with are CSC and TRPO-PID, which were published in 2020. Are there other more recent work in the safty RL literature? If time is not enough for experimental comparison, that's fine (though preferred); It would be nice to mention the most recent advances and the advantages of the proposed method compared with them. (I am not an expert in safety RL, but would love to see clarification

Reviewer 02Rating 8Confidence 4

Strengths

1. The paper is very well-written 2. The motivational example is great! I would recommend getting the message a bit more fleshed out. That is, that adding a distance measure in this example with sparse rewards made a huge difference for training. The paper tries to do the same trick in safe RL, where we often treat safety similarly to RL in the sparse reward setting - either safe or not safe. 3. The paper has a great ablation study

Weaknesses

1. The CPO baseline is fine, but not amazing. I think it would be good to use at least one more baseline for demonstration. There are a few implementations in https://github.com/PKU-Alignment/safety-gymnasium 2. Ideally, it would be good to see more risk measure distances compared to the presented one.

Reviewer 03Rating 6Confidence 5

Strengths

- **[S1] Important topic:** Safe / Constrained RL is an important topic, and training policies that satisfy safety constraints without being overly conservative is an important goal. - **[S2] Compatible with many RL algorithms:** The proposed method augments the state with a learned representation, so it can be applied to many different RL algorithms. This is also demonstrated in the experiments, where it is applied to 4 different safe RL algorithms. - **[S3] Experiments across diverse set of ta

Weaknesses

**[W1] Confusing use of the term “risk” throughout the paper** - A risk measure is a function that maps a random variable to a scalar value (e.g., CVaR). Risk-sensitive / risk-aware / risk-averse RL methods typically apply a risk measure over a distribution of future returns. This is very different from the ideas proposed in this work, where a “risk function” is mapping states to a distribution over timesteps. I strongly suggest that the authors change the term “risk-informed” to something diffe

Videos

Safety Representations for Safer Policy Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)