Central-limit approach to risk-aware Markov decision processes
Pengqian Yu, Jia Yuan Yu, Huan Xu

TL;DR
This paper introduces a central-limit theorem-based method for evaluating and optimizing risk in Markov decision processes, applicable with known or unknown transition probabilities, and includes a gradient-based policy improvement algorithm.
Contribution
It presents a novel risk evaluation framework using a central limit theorem for MDPs and a convergent gradient-based policy improvement method.
Findings
Effective risk evaluation over long horizons
Applicable to known and unknown transition probabilities
Convergent policy improvement algorithm
Abstract
Whereas classical Markov decision processes maximize the expected reward, we consider minimizing the risk. We propose to evaluate the risk associated to a given policy over a long-enough time horizon with the help of a central limit theorem. The proposed approach works whether the transition probabilities are known or not. We also provide a gradient-based policy improvement algorithm that converges to a local optimum of the risk objective.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Risk and Portfolio Optimization · Advanced Control Systems Optimization
