Risk-averse Total-reward MDPs with ERM and EVaR

Xihong Su; Julien Grand-Cl\'ement; Marek Petrik

arXiv:2408.17286·cs.LG·July 15, 2025

Risk-averse Total-reward MDPs with ERM and EVaR

Xihong Su, Julien Grand-Cl\'ement, Marek Petrik

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper demonstrates that risk-averse total reward Markov Decision Processes with ERM and EVaR can be optimized using stationary policies, simplifying analysis and deployment in broad risk-averse reinforcement learning scenarios.

Contribution

It introduces a method to optimize risk-averse total reward MDPs with ERM and EVaR using stationary policies, applicable to transient MDPs with positive and negative rewards.

Findings

01

Exponential value iteration, policy iteration, and linear programming effectively compute optimal policies.

02

Results show the total reward criterion can outperform discounted criteria in risk-averse RL.

03

Applicable to MDPs with both positive and negative rewards under mild conditions.

Abstract

Optimizing risk-averse objectives in discounted MDPs is challenging because most models do not admit direct dynamic programming equations and require complex history-dependent policies. In this paper, we show that the risk-averse {\em total reward criterion}, under the Entropic Risk Measure (ERM) and Entropic Value at Risk (EVaR) risk measures, can be optimized by a stationary policy, making it simple to analyze, interpret, and deploy. We propose exponential value iteration, policy iteration, and linear programming to compute optimal policies. Compared with prior work, our results only require the relatively mild condition of transient MDPs and allow for {\em both} positive and negative rewards. Our results indicate that the total reward criterion may be preferable to the discounted criterion in a broad range of risk-averse reinforcement learning domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

suxh2019/ermlp
noneOfficial

Videos

Risk-averse Total-reward MDPs with ERM and EVaR· underline

Taxonomy

TopicsAuction Theory and Applications