Planning for Risk-Aversion and Expected Value in MDPs
Marc Rigter, Paul Duckworth, Bruno Lacerda, Nick Hawes

TL;DR
This paper introduces a lexicographic approach to planning in MDPs that balances risk aversion and expected cost, ensuring optimal CVaR while improving average performance.
Contribution
It proposes a novel lexicographic method that optimizes expected cost under CVaR constraints, addressing limitations of existing risk-averse planning techniques.
Findings
The approach achieves better expected costs than current algorithms.
It guarantees optimal CVaR in the solutions.
Evaluations on four domains validate the effectiveness of the method.
Abstract
Planning in Markov decision processes (MDPs) typically optimises the expected cost. However, optimising the expectation does not consider the risk that for any given run of the MDP, the total cost received may be unacceptably high. An alternative approach is to find a policy which optimises a risk-averse objective such as conditional value at risk (CVaR). However, optimising the CVaR alone may result in poor performance in expectation. In this work, we begin by showing that there can be multiple policies which obtain the optimal CVaR. This motivates us to propose a lexicographic approach which minimises the expected cost subject to the constraint that the CVaR of the total cost is optimal. We present an algorithm for this problem and evaluate our approach on four domains. Our results demonstrate that our lexicographic approach improves the expected cost compared to the state of the art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Bayesian Modeling and Causal Inference · Machine Learning and Algorithms
