Optimal Transport-Assisted Risk-Sensitive Q-Learning
Zahra Shahrooei, Ali Baheri

TL;DR
This paper introduces a risk-sensitive Q-learning algorithm that uses optimal transport theory to improve safety by reducing risky state visits and accelerating convergence in reinforcement learning tasks.
Contribution
It integrates optimal transport into Q-learning to explicitly incorporate safety preferences and improve risk management in decision-making policies.
Findings
Reduces visits to risky states in Gridworld environment
Achieves faster convergence compared to traditional Q-learning
Enhances safety in reinforcement learning policies
Abstract
The primary goal of reinforcement learning is to develop decision-making policies that prioritize optimal performance without considering risk or safety. In contrast, safe reinforcement learning aims to mitigate or avoid unsafe states. This paper presents a risk-sensitive Q-learning algorithm that leverages optimal transport theory to enhance the agent safety. By integrating optimal transport into the Q-learning framework, our approach seeks to optimize the policy's expected return while minimizing the Wasserstein distance between the policy's stationary distribution and a predefined risk distribution, which encapsulates safety preferences from domain experts. We validate the proposed algorithm in a Gridworld environment. The results indicate that our method significantly reduces the frequency of visits to risky states and achieves faster convergence to a stable policy compared to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · Fault Detection and Control Systems · Energy Efficient Wireless Sensor Networks
MethodsQ-Learning
