Stochastic Approximation for Risk-aware Markov Decision Processes
Wenjie Huang, William B. Haskell

TL;DR
This paper introduces a stochastic approximation algorithm for solving risk-aware Markov decision processes, combining saddle-point problem solving with Q-learning, and provides convergence guarantees for various risk measures.
Contribution
It presents a novel two-loop stochastic approximation algorithm that handles multiple risk measures in risk-aware MDPs with proven convergence properties.
Findings
Algorithm converges almost surely.
Convergence rate is explicitly characterized.
Applicable to multiple risk measures like CVaR and OCE.
Abstract
We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs -learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g. conditional value-at-risk, optimized certainty equivalent, and absolute semi-deviation) are covered by our algorithm. Almost sure convergence and the convergence rate of the algorithm are established. For an error tolerance for the optimal -value estimation gap and learning rate , the overall convergence rate of our algorithm is with probability at least .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Statistical Methods and Inference · Advanced Bandit Algorithms Research
