Optimal Transport-Guided Safety in Temporal Difference Reinforcement Learning
Zahra Shahrooei, Ali Baheri

TL;DR
This paper introduces a novel reinforcement learning algorithm that uses optimal transport theory to quantify action uncertainty, promoting safer decision-making while maintaining performance in uncertain environments.
Contribution
It proposes a new temporal difference algorithm that incorporates optimal transport-based uncertainty scores to enhance safety in reinforcement learning policies.
Findings
Reduces probability of unsafe state visits
Maintains performance under environment uncertainty
Provides safer decision-making in stochastic settings
Abstract
The primary goal of reinforcement learning is to develop decision-making policies that prioritize optimal performance, frequently without considering safety. In contrast, safe reinforcement learning seeks to reduce or avoid unsafe behavior. This paper views safety as taking actions with more predictable consequences under environment stochasticity and introduces a temporal difference algorithm that uses optimal transport theory to quantify the uncertainty associated with actions. By integrating this uncertainty score into the decision-making objective, the agent is encouraged to favor actions with more predictable outcomes. We theoretically prove that our algorithm leads to a reduction in the probability of visiting unsafe states. We evaluate the proposed algorithm on several case studies in the presence of various forms of environment uncertainty. The results demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuel Cells and Related Materials · Traffic control and management · Machine Learning and ELM
