Stochastic dominance-constrained Markov decision processes
William B. Haskell, Rahul Jain

TL;DR
This paper introduces a linear programming approach to solve risk-constrained Markov decision processes using stochastic dominance constraints, applicable to both average and discounted reward settings, with a portfolio optimization example.
Contribution
It develops a novel linear programming formulation for stochastic dominance-constrained MDPs, including dual dynamic programming equations with a new pricing term, extending to various stochastic orders.
Findings
Linear constraints on occupation measures for risk constraints.
Optimal policies derived from linear programs incorporating dominance constraints.
Application demonstrated in portfolio optimization.
Abstract
We are interested in risk constraints for infinite horizon discrete time Markov decision processes (MDPs). Starting with average reward MDPs, we show that increasing concave stochastic dominance constraints on the empirical distribution of reward lead to linear constraints on occupation measures. The optimal policy for the resulting class of dominance-constrained MDPs is obtained by solving a linear program. We compute the dual of this linear program to obtain average dynamic programming optimality equations that reflect the dominance constraint. In particular, a new pricing term appears in the optimality equations corresponding to the dominance constraint. We show that many types of stochastic orders can be used in place of the increasing concave stochastic order. We also carry out a parallel development for discounted reward MDPs with stochastic dominance constraints. The paper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Economic theories and models · Reinforcement Learning in Robotics
