Optimal Transport-Assisted Risk-Sensitive Q-Learning

Zahra Shahrooei; Ali Baheri

arXiv:2406.11774·cs.LG·September 13, 2024

Optimal Transport-Assisted Risk-Sensitive Q-Learning

Zahra Shahrooei, Ali Baheri

PDF

Open Access

TL;DR

This paper introduces a risk-sensitive Q-learning algorithm that uses optimal transport theory to improve safety by reducing risky state visits and accelerating convergence in reinforcement learning tasks.

Contribution

It integrates optimal transport into Q-learning to explicitly incorporate safety preferences and improve risk management in decision-making policies.

Findings

01

Reduces visits to risky states in Gridworld environment

02

Achieves faster convergence compared to traditional Q-learning

03

Enhances safety in reinforcement learning policies

Abstract

The primary goal of reinforcement learning is to develop decision-making policies that prioritize optimal performance without considering risk or safety. In contrast, safe reinforcement learning aims to mitigate or avoid unsafe states. This paper presents a risk-sensitive Q-learning algorithm that leverages optimal transport theory to enhance the agent safety. By integrating optimal transport into the Q-learning framework, our approach seeks to optimize the policy's expected return while minimizing the Wasserstein distance between the policy's stationary distribution and a predefined risk distribution, which encapsulates safety preferences from domain experts. We validate the proposed algorithm in a Gridworld environment. The results indicate that our method significantly reduces the frequency of visits to risky states and achieves faster convergence to a stable policy compared to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · Fault Detection and Control Systems · Energy Efficient Wireless Sensor Networks

MethodsQ-Learning