Safety Representations for Safer Policy Learning
Kaustubh Mani, Vincent Mai, Charlie Gauthier, Annie Chen, Samer, Nashed, Liam Paull

TL;DR
This paper introduces a novel method for reinforcement learning that learns safety representations to enable safer and more efficient exploration in safety-critical environments, outperforming existing constrained exploration techniques.
Contribution
We propose a method that explicitly learns state-conditioned safety representations to improve safe exploration without excessive caution, enhancing policy learning efficiency.
Findings
Significantly improves task performance in safety-critical environments
Reduces constraint violations during training
Balances exploration and safety effectively
Abstract
Reinforcement learning algorithms typically necessitate extensive exploration of the state space to find optimal policies. However, in safety-critical applications, the risks associated with such exploration can lead to catastrophic consequences. Existing safe exploration methods attempt to mitigate this by imposing constraints, which often result in overly conservative behaviours and inefficient learning. Heavy penalties for early constraint violations can trap agents in local optima, deterring exploration of risky yet high-reward regions of the state space. To address this, we introduce a method that explicitly learns state-conditioned safety representations. By augmenting the state features with these safety representations, our approach naturally encourages safer exploration without being excessively cautious, resulting in more efficient and safer policy learning in safety-critical…
Peer Reviews
Decision·ICLR 2025 Poster
1. The idea of state-conditioned risk representations is intuitive and reasonable, and is a novel approach in the community of safety RL. 2. The method is sound and shows significant improvements in multiple environments, and the transferability of the risk models are also tested. 3. The structure of the paper is clear and the writing is easy to follow.
1. The related works in the recent two years seem to be missing, and the latest baselines that the author compared with are CSC and TRPO-PID, which were published in 2020. Are there other more recent work in the safty RL literature? If time is not enough for experimental comparison, that's fine (though preferred); It would be nice to mention the most recent advances and the advantages of the proposed method compared with them. (I am not an expert in safety RL, but would love to see clarification
1. The paper is very well-written 2. The motivational example is great! I would recommend getting the message a bit more fleshed out. That is, that adding a distance measure in this example with sparse rewards made a huge difference for training. The paper tries to do the same trick in safe RL, where we often treat safety similarly to RL in the sparse reward setting - either safe or not safe. 3. The paper has a great ablation study
1. The CPO baseline is fine, but not amazing. I think it would be good to use at least one more baseline for demonstration. There are a few implementations in https://github.com/PKU-Alignment/safety-gymnasium 2. Ideally, it would be good to see more risk measure distances compared to the presented one.
- **[S1] Important topic:** Safe / Constrained RL is an important topic, and training policies that satisfy safety constraints without being overly conservative is an important goal. - **[S2] Compatible with many RL algorithms:** The proposed method augments the state with a learned representation, so it can be applied to many different RL algorithms. This is also demonstrated in the experiments, where it is applied to 4 different safe RL algorithms. - **[S3] Experiments across diverse set of ta
**[W1] Confusing use of the term “risk” throughout the paper** - A risk measure is a function that maps a random variable to a scalar value (e.g., CVaR). Risk-sensitive / risk-aware / risk-averse RL methods typically apply a risk measure over a distribution of future returns. This is very different from the ideas proposed in this work, where a “risk function” is mapping states to a distribution over timesteps. I strongly suggest that the authors change the term “risk-informed” to something diffe
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
