Markov Decision Processes with Value-at-Risk Criterion

Li Xia; Jinyan Pan

arXiv:2507.22355·math.OC·July 31, 2025

Markov Decision Processes with Value-at-Risk Criterion

Li Xia, Jinyan Pan

PDF

TL;DR

This paper develops a novel framework for optimizing Value-at-Risk (VaR) in Markov decision processes, transforming the problem into probabilistic minimization MDPs and providing algorithms with proven convergence.

Contribution

It introduces a bilevel optimization approach for VaR in MDPs, establishing policy optimality conditions and developing convergent algorithms for both steady-state and finite-horizon scenarios.

Findings

01

Efficient algorithms for VaR maximization in MDPs.

02

Proven optimality of deterministic policies for steady-state VaR.

03

Numerical experiments demonstrate practical applicability.

Abstract

Value-at-risk (VaR), also known as quantile, is a crucial risk measure in finance and other fields. However, optimizing VaR metrics in Markov decision processes (MDPs) is challenging because VaR is non-additive and the traditional dynamic programming is inapplicable. This paper conducts a comprehensive study on VaR optimization in discrete-time finite MDPs. We consider VaR in two key scenarios: the VaR of steady-state rewards over an infinite horizon and the VaR of accumulated rewards over a finite horizon. By establishing the equivalence between the VaR maximization MDP and a series of probabilistic minimization MDPs, we transform the VaR maximization MDP into a constrained bilevel optimization problem. The inner-level is a policy optimization of minimizing the probability that MDP rewards fall below a target $λ$ , while the outer-level is a single parameter optimization of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.