Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion
Li Xia, Peter W. Glynn

TL;DR
This paper develops a novel framework for optimizing long-run CVaR in Markov decision processes, introducing new formulas, optimality conditions, and an algorithm with applications in portfolio management.
Contribution
It introduces a pseudo CVaR metric, a CVaR difference formula, and a policy iteration algorithm for long-run CVaR optimization in MDPs, addressing a challenging risk metric.
Findings
Derived a CVaR difference formula for policy comparison.
Established a Bellman local optimality equation for CVaR.
Developed a convergent policy iteration algorithm for CVaR optimization.
Abstract
CVaR (Conditional Value at Risk) is a risk metric widely used in finance. However, dynamically optimizing CVaR is difficult since it is not a standard Markov decision process (MDP) and the principle of dynamic programming fails. In this paper, we study the infinite-horizon discrete-time MDP with a long-run CVaR criterion, from the view of sensitivity-based optimization. By introducing a pseudo CVaR metric, we derive a CVaR difference formula which quantifies the difference of long-run CVaR under any two policies. The optimality of deterministic policies is derived. We obtain a so-called Bellman local optimality equation for CVaR, which is a necessary and sufficient condition for local optimal policies and only necessary for global optimal policies. A CVaR derivative formula is also derived for providing more sensitivity information. Then we develop a policy iteration type algorithm to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Economic theories and models
