Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement   Learning

Xiaoteng Ma; Shuai Ma; Li Xia; Qianchuan Zhao

arXiv:2206.07376·cs.LG·March 9, 2023

Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Xiaoteng Ma, Shuai Ma, Li Xia, Qianchuan Zhao

PDF

TL;DR

This paper introduces risk-averse reinforcement learning methods optimizing the mean-semivariance criterion, addressing the challenge of semivariance's time-inconsistency with novel algorithms validated through diverse experiments.

Contribution

It develops a new theoretical framework for MSV optimization in RL using Perturbation Analysis and proposes two practical on-policy algorithms based on policy gradients and trust regions.

Findings

01

Algorithms outperform baseline methods in various tasks.

02

MSV optimization improves risk management in RL.

03

Proposed methods are effective in both simple and complex environments.

Abstract

Keeping risk under control is often more crucial than maximizing expected rewards in real-world decision-making situations, such as finance, robotics, autonomous driving, etc. The most natural choice of risk measures is variance, which penalizes the upside volatility as much as the downside part. Instead, the (downside) semivariance, which captures the negative deviation of a random variable under its mean, is more suitable for risk-averse proposes. This paper aims at optimizing the mean-semivariance (MSV) criterion in reinforcement learning w.r.t. steady reward distribution. Since semivariance is time-inconsistent and does not satisfy the standard Bellman equation, the traditional dynamic programming methods are inapplicable to MSV problems directly. To tackle this challenge, we resort to Perturbation Analysis (PA) theory and establish the performance difference formula for MSV. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.