Stochastic Approximation for Risk-aware Markov Decision Processes

Wenjie Huang; William B. Haskell

arXiv:1805.04238·math.OC·December 5, 2019·6 cites

Stochastic Approximation for Risk-aware Markov Decision Processes

Wenjie Huang, William B. Haskell

PDF

Open Access

TL;DR

This paper introduces a stochastic approximation algorithm for solving risk-aware Markov decision processes, combining saddle-point problem solving with Q-learning, and provides convergence guarantees for various risk measures.

Contribution

It presents a novel two-loop stochastic approximation algorithm that handles multiple risk measures in risk-aware MDPs with proven convergence properties.

Findings

01

Algorithm converges almost surely.

02

Convergence rate is explicitly characterized.

03

Applicable to multiple risk measures like CVaR and OCE.

Abstract

We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs $Q$ -learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g. conditional value-at-risk, optimized certainty equivalent, and absolute semi-deviation) are covered by our algorithm. Almost sure convergence and the convergence rate of the algorithm are established. For an error tolerance $ϵ > 0$ for the optimal $Q$ -value estimation gap and learning rate $k \in (1/2, 1]$ , the overall convergence rate of our algorithm is $Ω ((ln (1/ δ ϵ) / ϵ^{2})^{1/ k} + (ln (1/ ϵ))^{1/ (1 - k)})$ with probability at least $1 - δ$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Statistical Methods and Inference · Advanced Bandit Algorithms Research