Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

Shangtong Zhang; Bo Liu; Shimon Whiteson

arXiv:2004.10888·cs.LG·April 8, 2022·5 cites

Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

Shangtong Zhang, Bo Liu, Shimon Whiteson

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a flexible mean-variance policy iteration framework for risk-averse reinforcement learning, enabling off-the-shelf adaptation of existing methods and demonstrating superior performance in Mujoco robot simulations.

Contribution

It proposes a novel MVPI framework that allows risk-averse control to be integrated with existing policy evaluation and control methods, including off-policy learning and deterministic policies.

Findings

01

Risk-averse TD3 outperforms vanilla TD3 in Mujoco tasks.

02

First to incorporate deterministic policies in risk-averse RL.

03

Achieves better risk-aware performance in challenging simulations.

Abstract

We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable. MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf, in both on- and off-policy settings. This flexibility reduces the gap between risk-neutral control and risk-averse control and is achieved by working on a novel augmented MDP directly. We propose risk-averse TD3 as an example instantiating MVPI, which outperforms vanilla TD3 and many previous risk-averse control methods in challenging Mujoco robot simulation tasks under a risk-aware performance metric. This risk-averse TD3 is the first to introduce deterministic policies and off-policy learning into risk-averse reinforcement learning, both of which are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ShangtongZhang/DeepRL
pytorchOfficial

Videos

Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Neurological disorders and treatments

MethodsTarget Policy Smoothing · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Experience Replay · Dense Connections · Clipped Double Q-learning · Twin Delayed Deep Deterministic · Entropy Regularization · Proximal Policy Optimization