A Risk-Sensitive Approach to Policy Optimization

Jared Markowitz; Ryan W. Gardner; Ashley Llorens; Raman Arora; I-Jeng; Wang

arXiv:2208.09106·cs.LG·November 17, 2023·1 cites

A Risk-Sensitive Approach to Policy Optimization

Jared Markowitz, Ryan W. Gardner, Ashley Llorens, Raman Arora, I-Jeng, Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a risk-sensitive reinforcement learning method that optimizes the full-episode reward distribution, improving safety and performance by emphasizing outcomes where the agent performs poorly.

Contribution

It proposes a direct risk-sensitive policy optimization approach using the CDF of full-episode rewards, applicable to various action spaces and settings, with a novel gradient estimation technique.

Findings

01

Moderately pessimistic risk profiles enhance exploration.

02

Risk-sensitive methods reduce costs and improve rewards in safety environments.

03

Approach outperforms state-of-the-art on-policy methods in experiments.

Abstract

Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy. This differs from human decision-making, where gains and losses are valued differently and outlying outcomes are given increased consideration. It also fails to capitalize on opportunities to improve safety and/or performance through the incorporation of distributional context. Several approaches to distributional DRL have been investigated, with one popular strategy being to evaluate the projected distribution of returns for possible actions. We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized. This approach allows for outcomes to be weighed based on relative quality, can be used for both continuous and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JHU-APL-ISC-Deep-RL/risk-sensitive
pytorchOfficial

Videos

A Risk-Sensitive Approach to Policy Optimization· underline

Taxonomy

TopicsHealth Systems, Economic Evaluations, Quality of Life

MethodsTest