Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence
Minheng Xiao, Xian Yu, Lei Ying

TL;DR
This paper develops a new policy gradient method for risk-sensitive distributional reinforcement learning, providing theoretical guarantees and demonstrating its effectiveness in stochastic environments.
Contribution
It introduces a general analytical gradient formula for risk-sensitive DRL and proposes a categorical distributional policy gradient algorithm with convergence guarantees.
Findings
Effective risk-sensitive policy learning demonstrated in stochastic Cliffwalk and CartPole environments.
Finite-sample convergence guarantees under inexact policy evaluation.
Theoretical analysis of gradient computation for general coherent risk measures.
Abstract
Risk-sensitive reinforcement learning (RL) is crucial for maintaining reliable performance in high-stakes applications. While traditional RL methods aim to learn a point estimate of the random cumulative cost, distributional RL (DRL) seeks to estimate the entire distribution of it, which leads to a unified framework for handling different risk measures. However, developing policy gradient methods for risk-sensitive DRL is inherently more complex as it involves finding the gradient of a probability measure. This paper introduces a new policy gradient method for risk-sensitive DRL with general coherent risk measures, where we provide an analytical form of the probability measure's gradient for any distribution. For practical use, we design a categorical distributional policy gradient algorithm (CDPG) that approximates any distribution by a categorical family supported on some fixed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management · Reinforcement Learning in Robotics · Traffic control and management
