On the Global Convergence of Risk-Averse Natural Policy Gradient Methods with Expected Conditional Risk Measures

Xian Yu; Lei Ying

arXiv:2301.10932·cs.LG·January 21, 2026

On the Global Convergence of Risk-Averse Natural Policy Gradient Methods with Expected Conditional Risk Measures

Xian Yu, Lei Ying

PDF

Open Access

TL;DR

This paper establishes global convergence guarantees for risk-averse natural policy gradient methods using Expected Conditional Risk Measures in reinforcement learning, supported by theoretical analysis and empirical testing.

Contribution

It introduces a risk-averse NPG algorithm with convergence guarantees for ECRM-based RL, extending policy gradient theory to risk-sensitive settings.

Findings

01

Proves global optimality of risk-averse NPG with softmax and entropy regularization.

02

Provides iteration complexity bounds for the proposed algorithm.

03

Demonstrates effectiveness on a stochastic Cliffwalk environment.

Abstract

Risk-sensitive reinforcement learning (RL) has become a popular tool for controlling the risk of uncertain outcomes and ensuring reliable performance in highly stochastic sequential decision-making problems. While it has been shown that policy gradient methods can find globally optimal policies in the risk-neutral setting, it remains unclear if the risk-averse variants enjoy the same global convergence guarantees. In this paper, we consider a class of dynamic time-consistent risk measures, named Expected Conditional Risk Measures (ECRMs), and derive natural policy gradient (NPG) updates for ECRMs-based RL problems. We provide global optimality and iteration complexity of the proposed risk-averse NPG algorithm with softmax parameterization and entropy regularization under both exact and inexact policy evaluation. Furthermore, we test our risk-averse NPG algorithm on a stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsTest · Softmax · REINFORCE