Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement   Learning with Provable Convergence

Minheng Xiao; Xian Yu; Lei Ying

arXiv:2405.14749·cs.LG·February 3, 2025

Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence

Minheng Xiao, Xian Yu, Lei Ying

PDF

Open Access

TL;DR

This paper develops a new policy gradient method for risk-sensitive distributional reinforcement learning, providing theoretical guarantees and demonstrating its effectiveness in stochastic environments.

Contribution

It introduces a general analytical gradient formula for risk-sensitive DRL and proposes a categorical distributional policy gradient algorithm with convergence guarantees.

Findings

01

Effective risk-sensitive policy learning demonstrated in stochastic Cliffwalk and CartPole environments.

02

Finite-sample convergence guarantees under inexact policy evaluation.

03

Theoretical analysis of gradient computation for general coherent risk measures.

Abstract

Risk-sensitive reinforcement learning (RL) is crucial for maintaining reliable performance in high-stakes applications. While traditional RL methods aim to learn a point estimate of the random cumulative cost, distributional RL (DRL) seeks to estimate the entire distribution of it, which leads to a unified framework for handling different risk measures. However, developing policy gradient methods for risk-sensitive DRL is inherently more complex as it involves finding the gradient of a probability measure. This paper introduces a new policy gradient method for risk-sensitive DRL with general coherent risk measures, where we provide an analytical form of the probability measure's gradient for any distribution. For practical use, we design a categorical distributional policy gradient algorithm (CDPG) that approximates any distribution by a categorical family supported on some fixed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSupply Chain and Inventory Management · Reinforcement Learning in Robotics · Traffic control and management