A Risk-Sensitive Approach to Policy Optimization
Jared Markowitz, Ryan W. Gardner, Ashley Llorens, Raman Arora, I-Jeng, Wang

TL;DR
This paper introduces a risk-sensitive reinforcement learning method that optimizes the full-episode reward distribution, improving safety and performance by emphasizing outcomes where the agent performs poorly.
Contribution
It proposes a direct risk-sensitive policy optimization approach using the CDF of full-episode rewards, applicable to various action spaces and settings, with a novel gradient estimation technique.
Findings
Moderately pessimistic risk profiles enhance exploration.
Risk-sensitive methods reduce costs and improve rewards in safety environments.
Approach outperforms state-of-the-art on-policy methods in experiments.
Abstract
Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy. This differs from human decision-making, where gains and losses are valued differently and outlying outcomes are given increased consideration. It also fails to capitalize on opportunities to improve safety and/or performance through the incorporation of distributional context. Several approaches to distributional DRL have been investigated, with one popular strategy being to evaluate the projected distribution of returns for possible actions. We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized. This approach allows for outcomes to be weighed based on relative quality, can be used for both continuous and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHealth Systems, Economic Evaluations, Quality of Life
MethodsTest
