Risk-Sensitive Reinforcement Learning via Policy Gradient Search

Prashanth L.A.; Michael Fu

arXiv:1810.09126·cs.LG·May 25, 2022

Risk-Sensitive Reinforcement Learning via Policy Gradient Search

Prashanth L.A., Michael Fu

PDF

Open Access

TL;DR

This paper surveys recent policy gradient methods for risk-sensitive reinforcement learning, addressing both constrained risk optimization and risk as an objective, highlighting challenges and future directions.

Contribution

It provides a comprehensive overview of policy gradient approaches for various risk measures in RL, including a template for risk-sensitive algorithms using Lagrangian methods.

Findings

01

Survey of risk measures like variance, CVaR, chance constraints

02

Presentation of a policy gradient template for risk constraints

03

Discussion of challenges and future research directions

Abstract

The objective in a traditional reinforcement learning (RL) problem is to find a policy that optimizes the expected value of a performance metric such as the infinite-horizon cumulative discounted or long-run average cost/reward. In practice, optimizing the expected value alone may not be satisfactory, in that it may be desirable to incorporate the notion of risk into the optimization problem formulation, either in the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., exponential utility, variance, percentile performance, chance constraints, value at risk (quantile), conditional value-at-risk, prospect theory and its later enhancement, cumulative prospect theory. In this book, we consider risk-sensitive RL in two settings: one where the goal is to find a policy that optimizes the usual expected value objective while ensuring that a risk…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Risk and Portfolio Optimization