Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

Yinlam Chow; Mohammad Ghavamzadeh; Lucas Janson; Marco Pavone

arXiv:1512.01629·cs.AI·April 7, 2017·54 cites

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, Marco Pavone

PDF

Open Access

TL;DR

This paper develops reinforcement learning algorithms for risk-constrained MDPs that incorporate percentile risk measures like CVaR, providing convergence guarantees and demonstrating effectiveness in practical applications.

Contribution

It introduces novel policy gradient and actor-critic algorithms for risk-constrained MDPs with theoretical convergence proofs.

Findings

01

Algorithms successfully optimize risk-aware policies.

02

Convergence to locally optimal policies is proven.

03

Effective in stopping and marketing applications.

Abstract

In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account \emph{risk}, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented via a chance constraint or a constraint on the conditional value-at-risk (CVaR) of the cumulative cost. We collectively refer to such problems as percentile risk-constrained MDPs. Specifically, we first derive a formula for computing the gradient of the Lagrangian function for percentile risk-constrained MDPs. Then, we devise policy gradient and actor-critic algorithms that (1) estimate such gradient, (2) update the policy in the descent direction, and (3) update the Lagrange multiplier in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Risk and Portfolio Optimization