Risk-Sensitive Reinforcement Learning via Policy Gradient Search
Prashanth L.A., Michael Fu

TL;DR
This paper surveys recent policy gradient methods for risk-sensitive reinforcement learning, addressing both constrained risk optimization and risk as an objective, highlighting challenges and future directions.
Contribution
It provides a comprehensive overview of policy gradient approaches for various risk measures in RL, including a template for risk-sensitive algorithms using Lagrangian methods.
Findings
Survey of risk measures like variance, CVaR, chance constraints
Presentation of a policy gradient template for risk constraints
Discussion of challenges and future research directions
Abstract
The objective in a traditional reinforcement learning (RL) problem is to find a policy that optimizes the expected value of a performance metric such as the infinite-horizon cumulative discounted or long-run average cost/reward. In practice, optimizing the expected value alone may not be satisfactory, in that it may be desirable to incorporate the notion of risk into the optimization problem formulation, either in the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., exponential utility, variance, percentile performance, chance constraints, value at risk (quantile), conditional value-at-risk, prospect theory and its later enhancement, cumulative prospect theory. In this book, we consider risk-sensitive RL in two settings: one where the goal is to find a policy that optimizes the usual expected value objective while ensuring that a risk…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Risk and Portfolio Optimization
