Entropic Risk Constrained Soft-Robust Policy Optimization

Reazul Hasan Russel; Bahram Behzadian; Marek Petrik

arXiv:2006.11679·cs.LG·June 23, 2020

Entropic Risk Constrained Soft-Robust Policy Optimization

Reazul Hasan Russel, Bahram Behzadian, Marek Petrik

PDF

Open Access

TL;DR

This paper introduces risk-averse policy optimization algorithms using entropic risk measures to manage model uncertainty in reinforcement learning, demonstrated across various problem domains.

Contribution

It proposes novel entropic risk constrained policy gradient and actor-critic algorithms for risk-averse reinforcement learning under model uncertainty.

Findings

01

Algorithms effectively manage risk in high-stakes domains.

02

Demonstrated improved safety and robustness in multiple environments.

03

Provides a new framework for risk-sensitive policy optimization.

Abstract

Having a perfect model to compute the optimal policy is often infeasible in reinforcement learning. It is important in high-stakes domains to quantify and manage risk induced by model uncertainties. Entropic risk measure is an exponential utility-based convex risk measure that satisfies many reasonable properties. In this paper, we propose an entropic risk constrained policy gradient and actor-critic algorithms that are risk-averse to the model uncertainty. We demonstrate the usefulness of our algorithms on several problem domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Risk and Portfolio Optimization · Advanced Bandit Algorithms Research