On the Fundamental Limitations of Dual Static CVaR Decompositions in Markov Decision Processes
Mathieu Godbout, Audrey Durand

TL;DR
This paper investigates the limitations of dual static CVaR decompositions in Markov Decision Processes, revealing fundamental issues in policy evaluation and the impossibility of a universally optimal policy across all risk levels.
Contribution
It introduces risk-assignment consistency constraints, explains evaluation errors via the CVaR evaluation gap, and proves the fundamental limits of finding a single optimal policy for all risk levels.
Findings
Evaluation errors stem from non-zero CVaR evaluation gap.
Dual CVaR decomposition cannot guarantee a universally optimal policy.
Identifies an MDP where no single policy is optimal across all risk levels.
Abstract
It was recently shown that dynamic programming (DP) methods for finding static CVaR-optimal policies in Markov Decision Processes (MDPs) can fail when based on the dual formulation, yet the root cause of this failure remains unclear. We expand on these findings by shifting focus from policy optimization to the seemingly simpler task of policy evaluation. We show that evaluating the static CVaR of a given policy can be framed as two distinct minimization problems. We introduce a set of ``risk-assignment consistency constraints'' that must be satisfied for their solutions to match and we demonstrate that an empty intersection of these constraints is the source of previously observed evaluation errors. Quantifying the evaluation error as the \emph{CVaR evaluation gap}, we demonstrate that the issues observed when optimizing over the dual-based CVaR DP are explained by the returned policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
