A Natural Actor-Critic Algorithm with Downside Risk Constraints

Thomas Spooner; Rahul Savani

arXiv:2007.04203·cs.LG·July 9, 2020·1 cites

A Natural Actor-Critic Algorithm with Downside Risk Constraints

Thomas Spooner, Rahul Savani

PDF

Open Access

TL;DR

This paper introduces a new risk-sensitive reinforcement learning algorithm that efficiently estimates downside risk using a novel Bellman equation, improving sample efficiency and stability in constrained policy optimization.

Contribution

It proposes a new Bellman equation for downside risk, proving its contraction property, and extends an actor-critic method with natural policy gradients for risk-sensitive control.

Findings

01

Effective on three benchmark problems

02

Improved sample efficiency and stability

03

Demonstrates practical utility of the new risk proxy

Abstract

Existing work on risk-sensitive reinforcement learning - both for symmetric and downside risk measures - has typically used direct Monte-Carlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also suffers from high variance and decreased sample efficiency compared to temporal-difference methods. In this paper, we study prediction and control with aversion to downside risk which we gauge by the lower partial moment of the return. We introduce a new Bellman equation that upper bounds the lower partial moment, circumventing its non-linearity. We prove that this proxy for the lower partial moment is a contraction, and provide intuition into the stability of the algorithm by variance decomposition. This allows sample-efficient, on-line estimation of partial moments. For risk-sensitive control, we instantiate Reward Constrained Policy Optimization, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Risk and Portfolio Optimization