Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, Marco Pavone

TL;DR
This paper develops reinforcement learning algorithms for risk-constrained MDPs that incorporate percentile risk measures like CVaR, providing convergence guarantees and demonstrating effectiveness in practical applications.
Contribution
It introduces novel policy gradient and actor-critic algorithms for risk-constrained MDPs with theoretical convergence proofs.
Findings
Algorithms successfully optimize risk-aware policies.
Convergence to locally optimal policies is proven.
Effective in stopping and marketing applications.
Abstract
In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account \emph{risk}, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented via a chance constraint or a constraint on the conditional value-at-risk (CVaR) of the cumulative cost. We collectively refer to such problems as percentile risk-constrained MDPs. Specifically, we first derive a formula for computing the gradient of the Lagrangian function for percentile risk-constrained MDPs. Then, we devise policy gradient and actor-critic algorithms that (1) estimate such gradient, (2) update the policy in the descent direction, and (3) update the Lagrange multiplier in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Risk and Portfolio Optimization
