Risk-Averse Markov Decision Processes through a Distributional Lens
Ziteng Cheng, Sebastian Jaimungal

TL;DR
This paper introduces a distributional approach to risk-averse Markov decision processes, enabling dynamic risk preferences and establishing foundational principles for optimal policies in complex stochastic environments.
Contribution
It develops distributional risk measures for MDPs, proving a dynamic programming principle and existence of optimal policies under broad conditions.
Findings
Dynamic risk measures enable changing risk preferences.
Existence of optimal policies proven for finite and infinite horizons.
Applications demonstrated in finance and autonomous driving.
Abstract
By adopting a distributional viewpoint on law-invariant convex risk measures, we construct dynamics risk measures (DRMs) at the distributional level. We then apply these DRMs to investigate Markov decision processes, incorporating latent costs, random actions, and weakly continuous transition kernels. Furthermore, the proposed DRMs allow risk aversion to change dynamically. Under mild assumptions, we derive a dynamic programming principle and show the existence of an optimal policy in both finite and infinite time horizons. Moreover, we provide a sufficient condition for the optimality of deterministic actions. For illustration, we conclude the paper with examples from optimal liquidation with limit order books and autonomous driving.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Insurance, Mortality, Demography, Risk Management · Insurance and Financial Risk Management
