Entropic Risk Constrained Soft-Robust Policy Optimization
Reazul Hasan Russel, Bahram Behzadian, Marek Petrik

TL;DR
This paper introduces risk-averse policy optimization algorithms using entropic risk measures to manage model uncertainty in reinforcement learning, demonstrated across various problem domains.
Contribution
It proposes novel entropic risk constrained policy gradient and actor-critic algorithms for risk-averse reinforcement learning under model uncertainty.
Findings
Algorithms effectively manage risk in high-stakes domains.
Demonstrated improved safety and robustness in multiple environments.
Provides a new framework for risk-sensitive policy optimization.
Abstract
Having a perfect model to compute the optimal policy is often infeasible in reinforcement learning. It is important in high-stakes domains to quantify and manage risk induced by model uncertainties. Entropic risk measure is an exponential utility-based convex risk measure that satisfies many reasonable properties. In this paper, we propose an entropic risk constrained policy gradient and actor-critic algorithms that are risk-averse to the model uncertainty. We demonstrate the usefulness of our algorithms on several problem domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Risk and Portfolio Optimization · Advanced Bandit Algorithms Research
