Markov Decision Processes with Value-at-Risk Criterion
Li Xia, Jinyan Pan

TL;DR
This paper develops a novel framework for optimizing Value-at-Risk (VaR) in Markov decision processes, transforming the problem into probabilistic minimization MDPs and providing algorithms with proven convergence.
Contribution
It introduces a bilevel optimization approach for VaR in MDPs, establishing policy optimality conditions and developing convergent algorithms for both steady-state and finite-horizon scenarios.
Findings
Efficient algorithms for VaR maximization in MDPs.
Proven optimality of deterministic policies for steady-state VaR.
Numerical experiments demonstrate practical applicability.
Abstract
Value-at-risk (VaR), also known as quantile, is a crucial risk measure in finance and other fields. However, optimizing VaR metrics in Markov decision processes (MDPs) is challenging because VaR is non-additive and the traditional dynamic programming is inapplicable. This paper conducts a comprehensive study on VaR optimization in discrete-time finite MDPs. We consider VaR in two key scenarios: the VaR of steady-state rewards over an infinite horizon and the VaR of accumulated rewards over a finite horizon. By establishing the equivalence between the VaR maximization MDP and a series of probabilistic minimization MDPs, we transform the VaR maximization MDP into a constrained bilevel optimization problem. The inner-level is a policy optimization of minimizing the probability that MDP rewards fall below a target , while the outer-level is a single parameter optimization of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
