On the Global Convergence of Risk-Averse Natural Policy Gradient Methods with Expected Conditional Risk Measures
Xian Yu, Lei Ying

TL;DR
This paper establishes global convergence guarantees for risk-averse natural policy gradient methods using Expected Conditional Risk Measures in reinforcement learning, supported by theoretical analysis and empirical testing.
Contribution
It introduces a risk-averse NPG algorithm with convergence guarantees for ECRM-based RL, extending policy gradient theory to risk-sensitive settings.
Findings
Proves global optimality of risk-averse NPG with softmax and entropy regularization.
Provides iteration complexity bounds for the proposed algorithm.
Demonstrates effectiveness on a stochastic Cliffwalk environment.
Abstract
Risk-sensitive reinforcement learning (RL) has become a popular tool for controlling the risk of uncertain outcomes and ensuring reliable performance in highly stochastic sequential decision-making problems. While it has been shown that policy gradient methods can find globally optimal policies in the risk-neutral setting, it remains unclear if the risk-averse variants enjoy the same global convergence guarantees. In this paper, we consider a class of dynamic time-consistent risk measures, named Expected Conditional Risk Measures (ECRMs), and derive natural policy gradient (NPG) updates for ECRMs-based RL problems. We provide global optimality and iteration complexity of the proposed risk-averse NPG algorithm with softmax parameterization and entropy regularization under both exact and inexact policy evaluation. Furthermore, we test our risk-averse NPG algorithm on a stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsTest · Softmax · REINFORCE
